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1.1 A Historical Perspective 

The concept of digital data manipulation has made a dramatic impact on our society. One 
has long grown accustomed to the idea of digital computers. Evolving steadily from main- 
frame and minicomputers, personal and laptop computers have proliferated into daily life. 
More significant, however, is a continuous trend towards digital solutions in all other 
areas of electronics. Instrumentation was one of the first noncomputing domains where the 
potential benefits of digital data manipulation over analog processing were recognized. 
Other areas such as control were soon to follow. Only recently have we witnessed the con- 
version of telecommunications and consumer electronics towards the digital format. 
Increasingly, telephone data is transmitted and processed digitally over both wired and 
wireless networks. The compact disk has revolutionized the audio world, and digital video 
is following in its footsteps. 

The idea of implementing computational engines using an encoded data format is by 
no means an idea of our times. In the early nineteenth century, Babbage envisioned large- 
scale mechanical computing devices, called Difference Engines [Swade93]. Although 
these engines use the decimal number system rather than the binary representation now 
common in modern electronics, the underlying concepts are very similar. The Analytical 
Engine, developed in 1834, was perceived as a general-purpose computing machine, with 
features strikingly close to modern computers. Besides executing the basic repertoire of 
operations (addition, subtraction, multiplication, and division) in arbitrary sequences, the 
machine operated in a two-cycle sequence, called “store” and “mill” (execute), similar to 
current computers. It even used pipelining to speed up the execution of the addition opera- 
tion! Unfortunately, the complexity and the cost of the designs made the concept impracti- 
cal. For instance, the design of Difference Engine I (part of which is shown in Figure 1.1) 
required 25,000 mechanical parts at a total cost of £17,470 (in 1834!). 




Figure 1 .1 Working part of Babbage’s 
Difference Engine I (1832), the first known 
automatic calculator (from [Swade93], 
courtesy of the Science Museum of 
London). 
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The electrical solution turned out to be more cost effective. Early digital electronics 
systems were based on magnetically controlled switches (or relays). They were mainly 
used in the implementation of very simple logic networks. Examples of such are train 
safety systems, where they are still being used at present. The age of digital electronic 
computing only started in full with the introduction of the vacuum tube. While originally 
used almost exclusively for analog processing, it was realized early on that the vacuum 
tube was useful for digital computations as well. Soon complete computers were realized. 
The era of the vacuum tube based computer culminated in the design of machines such as 
the ENIAC (intended for computing artillery firing tables) and the UNIVAC I (the first 
successful commercial computer). To get an idea about integration density , the ENIAC 
was 80 feet long, 8.5 feet high and several feet wide and incorporated 18,000 vacuum 
tubes. It became rapidly clear, however, that this design technology had reached its limits. 
Reliability problems and excessive power consumption made the implementation of larger 
engines economically and practically infeasible. 

All changed with the invention of the transistor at Bell Telephone Laboratories in 
1947 [Bardeen48], followed by the introduction of the bipolar transistor by Schockley in 
1949 [Schockley49] 1 . It took till 1956 before this led to the first bipolar digital logic gate, 
introduced by Harris [Harris56], and even more time before this translated into a set of 
integrated-circuit commercial logic gates, called the Fairchild Micrologic family 
[Norman60]. The first truly successful IC logic family, TTL (Transistor-Transistor Logic) 
was pioneered in 1962 [Beeson62]. Other logic families were devised with higher perfor- 
mance in mind. Examples of these are the current switching circuits that produced the first 
subnanosecond digital gates and culminated in the ECL ( Emitter-Coupled Logic) family 
[Masaki74]. TTL had the advantage, however, of offering a higher integration density and 
was the basis of the first integrated circuit revolution. In fact, the manufacturing of TTL 
components is what spear-headed the first large semiconductor companies such as Fair- 
child, National, and Texas Instruments. The family was so successful that it composed the 
largest fraction of the digital semiconductor market until the 1980s. 

Ultimately, bipolar digital logic lost the battle for hegemony in the digital design 
world for exactly the reasons that haunted the vacuum tube approach: the large power con- 
sumption per gate puts an upper limit on the number of gates that can be reliably integrated 
on a single die, package, housing, or box. Although attempts were made to develop high 
integration density, low-power bipolar families (such as I 2 L — Integrated Injection Logic 
[Hart72]), the torch was gradually passed to the MOS digital integrated circuit approach. 

The basic principle behind the MOSFET transistor (originally called IGFET) was 
proposed in a patent by J. Lilienfeld (Canada) as early as 1925, and, independently, by O. 
Heil in England in 1935. Insufficient knowledge of the materials and gate stability prob- 
lems, however, delayed the practical usability of the device for a long time. Once these 
were solved, MOS digital integrated circuits started to take off in full in the early 1970s. 
Remarkably, the first MOS logic gates introduced were of the CMOS variety 
[Wanlass63], and this trend continued till the late 1960s. The complexity of the manufac- 
turing process delayed the full exploitation of these devices for two more decades. Instead, 

1 An intriguing overview of the evolution of digital integrated circuits can be found in [Murphy93]. 

(Most of the data in this overview has been extracted from this reference). It is accompanied by some of the his- 
torically ground-breaking publications in the domain of digital IC’s. 
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the first practical MOS integrated circuits were implemented in PMOS-only logic and 
were used in applications such as calculators. The second age of the digital integrated cir- 
cuit revolution was inaugurated with the introduction of the first microprocessors by Intel 
in 1972 (the 4004) [Faggin72] and 1974 (the 8080) [Shima74]. These processors were 
implemented in NMOS-only logic, which has the advantage of higher speed over the 
PMOS logic. Simultaneously, MOS technology enabled the realization of the first high- 
density semiconductor memories. For instance, the first 4Kbit MOS memory was intro- 
duced in 1970 [Hoff70], 

These events were at the start of a truly astounding evolution towards ever higher 
integration densities and speed performances, a revolution that is still in full swing right 
now. The road to the current levels of integration has not been without hindrances, how- 
ever. In the late 1970s, NMOS-only logic started to suffer from the same plague that made 
high-density bipolar logic unattractive or infeasible: power consumption. This realization, 
combined with progress in manufacturing technology, finally tilted the balance towards 
the CMOS technology, and this is where we still are today. Interestingly enough, power 
consumption concerns are rapidly becoming dominant in CMOS design as well, and this 
time there does not seem to be a new technology around the corner to alleviate the 
problem. 

Although the large majority of the current integrated circuits are implemented in the 
MOS technology, other technologies come into play when very high performance is at 
stake. An example of this is the BiCMOS technology that combines bipolar and MOS 
devices on the same die. BiCMOS is used in high-speed memories and gate arrays. When 
even higher performance is necessary, other technologies emerge besides the already men- 
tioned bipolar silicon ECL family — Gallium-Arsenide, Silicon-Germanium and even 
superconducting technologies. These technologies only play a very small role in the over- 
all digital integrated circuit design scene. With the ever increasing performance of CMOS, 
this role is bound to be further reduced with time. Hence the focus of this textbook on 
CMOS only. 




1.2 Issues in Digital Integrated Circuit Design 

Integration density and performance of integrated circuits have gone through an astound- 
ing revolution in the last couple of decades. In the 1960s, Gordon Moore, then with Fair- 
child Corporation and later cofounder of Intel, predicted that the number of transistors that 
can be integrated on a single die would grow exponentially with time. This prediction, 
later called Moore’s law , has proven to be amazingly visionary [Moore65], Its validity is 
best illustrated with the aid of a set of graphs. Figure 1.2 plots the integration density of 
both logic IC’s and memory as a function of time. As can be observed, integration com- 
plexity doubles approximately every 1 to 2 years. As a result, memory density has 
increased by more than a thousandfold since 1970. 

An intriguing case study is offered by the microprocessor. From its inception in the 
early seventies, the microprocessor has grown in performance and complexity at a steady 
and predictable pace. The transistor counts for a number of landmark designs are collected 
in Figure 1.3. The million-transistor/chip barrier was crossed in the late eighties. Clock 
frequencies double every three years and have reached into the GHz range. This is illus- 
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(a) Trends in logic 1C complexity 



(b) Trends in memory complexity 



Figure 1.2 Evolution of integration complexity of logic ICs and memories as a function of time. 



trated in Figure 1 .4, which plots the microprocessor trends in terms of performance at the 
beginning of the 21 st century. An important observation is that, as of now, these trends 
have not shown any signs of a slow-down. 

It should be no surprise to the reader that this revolution has had a profound impact 
on how digital circuits are designed. Early designs were truly hand-crafted. Every transis- 
tor was laid out and optimized individually and carefully fitted into its environment. This 
is adequately illustrated in Figure 1.5a, which shows the design of the Intel 4004 micro- 
processor. This approach is, obviously, not appropriate when more than a million devices 
have to be created and assembled. With the rapid evolution of the design technology, 
time-to-market is one of the crucial factors in the ultimate success of a component. 




A- 




Figure 1.3 Historical evolution of microprocessor transistor count (from [IntelOl ]) . 
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Figure 1.4 Microprocessor performance 
trends at the beginning of the 21st century. 



Designers have, therefore, increasingly adhered to rigid design methodologies and strate- 
gies that are more amenable to design automation. The impact of this approach is apparent 
from the layout of one of the later Intel microprocessors, the Pentium® 4, shown in Figure 
1.5b. Instead of the individualized approach of the earlier designs, a circuit is constructed 
in a hierarchical way: a processor is a collection of modules, each of which consists of a 
number of cells on its own. Cells are reused as much as possible to reduce the design effort 
and to enhance the chances for a first-time-right implementation. The fact that this hierar- 
chical approach is at all possible is the key ingredient for the success of digital circuit 
design and also explains why, for instance, very large scale analog design has never 
caught on. 

The obvious next question is why such an approach is feasible in the digital world 
and not (or to a lesser degree) in analog designs. The crucial concept here, and the most 
important one in dealing with the complexity issue, is abstraction. At each design level, 
the internal details of a complex module can be abstracted away and replaced by a black 
box view or model. This model contains virtually all the information needed to deal with 
the block at the next level of hierarchy. For instance, once a designer has implemented a 
multiplier module, its performance can be defined very accurately and can be captured in a 
model. The performance of this multiplier is in general only marginally influenced by the 
way it is utilized in a larger system. For all purposes, it can hence be considered a black 
box with known characteristics. As there exists no compelling need for the system 
designer to look inside this box, design complexity is substantially reduced. The impact of 
this divide and conquer approach is dramatic. Instead of having to deal with a myriad of 
elements, the designer has to consider only a handful of components, each of which are 
characterized in performance and cost by a small number of parameters. 

This is analogous to a software designer using a library of software routines such as 
input/output drivers. Someone writing a large program does not bother to look inside those 
library routines. The only thing he cares about is the intended result of calling one of those 
modules. Imagine what writing software programs would be like if one had to fetch every 
bit individually from the disk and ensure its correctness instead of relying on handy “file 
open” and “get string” operators. 



A- 
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Typically used abstraction levels in digital circuit design are, in order of increasing 
abstraction, the device, circuit, gate, functional module (e.g., adder) and system levels 
(e.g., processor), as illustrated in Figure 1.6. A semiconductor device is an entity with a 




very complex behavior. No circuit designer will ever seriously consider the solid-state 
physics equations governing the behavior of the device when designing a digital gate. 
Instead he will use a simplified model that adequately describes the input-output behavior 
of the transistor. For instance, an AND gate is adequately described by its Boolean expres- 
sion (Z = A.B), its bounding box, the position of the input and output terminals, and the 
delay between the inputs and the output. 

This design philosophy has been the enabler for the emergence of elaborate com- 
puter-aided design (CAD) frameworks for digital integrated circuits; without it the current 
design complexity would not have been achievable. Design tools include simulation at the 
various complexity levels, design verification, layout generation, and design synthesis. An 
overview of these tools and design methodologies is given in Chapter 8 of this textbook. 

Furthermore, to avoid the redesign and reverification of frequently used cells such 
as basic gates and arithmetic and memory modules, designers most often resort to cell 
libraries. These libraries contain not only the layouts, but also provide complete docu- 
mentation and characterization of the behavior of the cells. The use of cell libraries is, for 
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instance, apparent in the layout of the Pentium ® 4 processor (Figure 1.5b). The integer 
and floating-point unit, just to name a few, contain large sections designed using the so- 
called standard cell approach. In this approach, logic gates are placed in rows of cells of 
equal height and interconnected using routing channels. The layout of such a block can be 
generated automatically given that a library of cells is available. 

The preceding analysis demonstrates that design automation and modular design 
practices have effectively addressed some of the complexity issues incurred in contempo- 
rary digital design. This leads to the following pertinent question. If design automation 
solves all our design problems, why should we be concerned with digital circuit design at 
all? Will the next-generation digital designer ever have to worry about transistors or para- 
sitics, or is the smallest design entity he will ever consider the gate and the module? 

The truth is that the reality is more complex, and various reasons exist as to why an 
insight into digital circuits and their intricacies will still be an important asset for a long 
time to come. 

• First of all, someone still has to design and implement the module libraries. Semi- 
conductor technologies continue to advance from year to year. Until one has devel- 
oped a fool-proof approach towards “porting” a cell from one technology to another, 
each change in technology — which happens approximately every two 
years — requires a redesign of the library. 

• Creating an adequate model of a cell or module requires an in-depth understanding 
of its internal operation. For instance, to identify the dominant performance parame- 
ters of a given design, one has to recognize the critical timing path first. 

• The library-based approach works fine when the design constraints (speed, cost or 
power) are not stringent. This is the case for a large number of application-specific 
designs , where the main goal is to provide a more integrated system solution, and 
performance requirements are easily within the capabilities of the technology. 
Unfortunately for a large number of other products such as microprocessors, success 
hinges on high performance, and designers therefore tend to push technology to its 
limits. At that point, the hierarchical approach tends to become somewhat less 
attractive. To resort to our previous analogy to software methodologies, a program- 
mer tends to “customize” software routines when execution speed is crucial; com- 
pilers — or design tools — are not yet to the level of what human sweat or ingenuity 
can deliver. 

• Even more important is the observation that the abstraction-based approach is only 
correct to a certain degree. The performance of, for instance, an adder can be sub- 
stantially influenced by the way it is connected to its environment. The interconnec- 
tion wires themselves contribute to delay as they introduce parasitic capacitances, 
resistances and even inductances. The impact of the interconnect parasitics is bound 
to increase in the years to come with the scaling of the technology. 

• Scaling tends to emphasize some other deficiencies of the abstraction-based model. 
Some design entities tend to be global or external (to resort anew to the software 
analogy). Examples of global factors are the clock signals, used for synchronization 
in a digital design, and the supply lines. Increasing the size of a digital design has a 





* 
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profound effect on these global signals. For instance, connecting more cells to a sup- 
ply line can cause a voltage drop over the wire, which, in its turn, can slow down all 
the connected cells. Issues such as clock distribution, circuit synchronization, and 
supply-voltage distribution are becoming more and more critical. Coping with them 
requires a profound understanding of the intricacies of digital circuit design. 

• Another impact of technology evolution is that new design issues and constraints 
tend to emerge over time. A typical example of this is the periodical reemergence of 
power dissipation as a constraining factor, as was already illustrated in the historical 
overview. Another example is the changing ratio between device and interconnect 
parasitics. To cope with these unforeseen factors, one must at least be able to model 
and analyze their impact, requiring once again a profound insight into circuit topol- 
ogy and behavior. 

• Finally, when things can go wrong, they do. A fabricated circuit does not always 
exhibit the exact waveforms one might expect from advance simulations. Deviations 
can be caused by variations in the fabrication process parameters, or by the induc- 
tance of the package, or by a badly modeled clock signal. Troubleshooting a design 
requires circuit expertise. 

For all the above reasons, it is my belief that an in-depth knowledge of digital circuit 
design techniques and approaches is an essential asset for a digital-system designer. Even 
though she might not have to deal with the details of the circuit on a daily basis, the under- 
standing will help her to cope with unexpected circumstances and to determine the domi- 
nant effects when analyzing a design. 

Example 1.1 Clocks Defy Hierarchy 

To illustrate some of the issues raised above, let us examine the impact of deficiencies in one 
of the most important global signals in a design, the clock. The function of the clock signal in 
a digital design is to order the multitude of events happening in the circuit. This task can be 
compared to the function of a traffic light that determines which cars are allowed to move. It 
also makes sure that all operations are completed before the next one starts — a traffic light 
should be green long enough to allow a car or a pedestrian to cross the road. Under ideal cir- 
cumstances, the clock signal is a periodic step waveform with transitions synchronized 
throughout the designed circuit (Figure 1.7a). In light of our analogy, changes in the traffic 
lights should be synchronized to maximize throughput while avoiding accidents. The impor- 
tance of the clock alignment concept is illustrated with the example of two cascaded registers, 
both operating on the rising edge of the clock (f) (Figure 1.7b). Under normal operating condi- 
tions, the input In gets sampled into the first register on the rising edge of (]) and appears at the 
output exactly one clock period later. This is confirmed by the simulations shown in Figure 
1.8c (signal Out). 

Due to delays associated with routing the clock wires, it may happen that the clocks 
become misaligned with respect to each other. As a result, the registers are interpreting time 
indicated by the clock signal differently. Consider the case that the clock signal for the second 
register is delayed — or skewed — by a value 8. The rising edge of the delayed clock t|>' will 
postpone the sampling of the input of the second register. If the time it takes to propagate the 
output of the first register to the input of the second is smaller than the clock delay, the latter 
will sample the wrong value. This causes the output to change prematurely, as clearly illus- 
trated in the simulation, where the signal Out' goes high at the first rising edge of (|>' instead of 
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Figure 1.7 Impact of clock misalignment. 





the second one. In terms of our traffic analogy, cars of a first traffic light hit the cars of the 
next light that have not left yet. 

Clock misalignment, or clock skew, as it is normally called, is an important example of 
how global signals may influence the functioning of a hierarchically designed system. Clock 
skew is actually one of the most critical design problems facing the designers of large, high- 
performance systems. 



A- 



Example 1.2 Power Distribution Networks Defy Hierarchy 

While the clock signal is one example of a global signal that crosses the chip hierarchy 
boundaries, the power distribution network represents another. A digital system requires a 
stable DC voltage to be supplied to the individual gates. To ensure proper operation, this 
voltage should be stable within a few hundred millivolts. The power distribution system 
has to provide this stable voltage in the presence of very large current variations. The 
resistive nature of the on-chip wires and the inductance of the IC package pins make this a 
difficult proposition. For example, the average DC current to be supplied to a 100 W- IV 
microprocessor equals 100 A! The peak current can easily be twice as large, and current 
demand can readily change from almost zero to this peak value over a short time — in the 
range of 1 nsec or less. This leads to a current variation of 100 GA/sec, which is a truly 
astounding number. 

Consider the problem of the resistance of power-distribution wires. A current of 1 A 
mnning through a wire with a resistance of 1 £2 causes a voltage drop of IV. With supply 
voltages of modern digital circuits ranging between 1.2 and 2.5 V, such a drop is unaccept- 
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Figure 1.8 Power distribution network design. 

able. Making the wires wider reduces the resistance, and hence the voltage drop. While 
this sizing of the power network is relatively simple in a flat design approach, it is a lot 
more complex in a hierarchical design. For example, consider the two blocks below in 
Figure 1.8a [SalehOl], If power distribution for Block A is examined in isolation, the addi- 
tional loading due to the presence of Block B is not taken into account. If power is routed 
through Block A to Block B, a larger IR drop will occur in Block B since power is also 
being consumed by Block A before it reaches Block B. 

Since the total IR drop is based on the resistance seen from the pin to the block, one 
could route around the block and feed power to each block separately, as shown in Figure 
1.8b. Ideally, the main trunks should be large enough to handle all the current flowing 
through separate branches. Although routing power this way is easier to control and main- 
tain, it also requires more area to implement. The large metal trunks of power have to be 
sized to handle all the current for each block. This requirement forces designers to set 
aside area for power busing that takes away from the available routing area. 

As more and more blocks are added, the complex interactions between the blocks 
determine the actual voltage drops. For instance, it is not always easy to determine which 
way the current will flow when multiple parallel paths are available between the power 
source and the consuming gate. Also, currents into the different modules do rarely peak at 
the same time. All these considerations make the design of the power-distribution a chal- 
lenging job. It requires a design methodology approach that supersedes the artificial 
boundaries imposed by hierarchical design. 



The purpose of this textbook is to provide a bridge between the abstract vision of 
digital design and the underlying digital circuit and its peculiarities. While starting from a 
solid understanding of the operation of electronic devices and an in-depth analysis of the 
nucleus of digital design — the inverter — we will gradually channel this knowledge into 
the design of more complex entities, such as complex gates, datapaths, registers, control- 
lers, and memories. The persistent quest for a designer when designing each of the men- 
tioned modules is to identify the dominant design parameters, to locate the section of the 
design he should focus his optimizations on, and to determine the specific properties that 
make the module under investigation (e.g., a memory) different from any others. 







* 
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The text also addresses other compelling (global) issues in modern digital circuit 
design such as power dissipation, interconnect, timing, and synchronization. 





1.3 Quality Metrics of a Digital Design 

This section defines a set of basic properties of a digital design. These properties help to 
quantify the quality of a design from different perspectives: cost, functionality, robustness, 
performance, and energy consumption. Which one of these metrics is most important 
depends upon the application. For instance, pure speed is a crucial property in a compute 
server. On the other hand, energy consumption is a dominant metric for hand-held mobile 
applications such as cell phones. The introduced properties are relevant at all levels of the 
design hierarchy, be it system, chip, module, and gate. To ensure consistency in the defini- 
tions throughout the design hierarchy stack, we propose a bottom-up approach: we start 
with defining the basic quality metrics of a simple inverter, and gradually expand these to 
the more complex functions such as gate, module, and chip. 

1 .3.1 Cost of an Integrated Circuit 

The total cost of any product can be separated into two components: the recurring 
expenses or the variable cost , and the non-recurring expenses or the fixed cost. 

Fixed Cost 

The fixed cost is independent of the sales volume, the number of products sold. An impor- 
tant component of the fixed cost of an integrated circuit is the effort in time and man- 
power it takes to produce the design. This design cost is strongly influenced by the com- 
plexity of the design, the aggressiveness of the specifications, and the productivity of the 
designer. Advanced design methodologies that automate major parts of the design process 
can help to boost the latter. Bringing down the design cost in the presence of an ever- 
increasing IC complexity is one of the major challenges that is always facing the semicon- 
ductor industry. 

Additionally, one has to account for the indirect costs, the company overhead that 
cannot be billed directly to one product. It includes amongst others the company’s 
research and development (R&D), manufacturing equipment, marketing, sales, and build- 
ing infrastructure. 

Variable Cost 

This accounts for the cost that is directly attributable to a manufactured product, and is 
hence proportional to the product volume. Variable costs include the costs of the parts 
used in the product, assembly costs, and testing costs. The total cost of an integrated cir- 
cuit is now 




cost per IC = variable cost per IC + 



^ fixed cost 










volume 



( 1 . 1 ) 





* 



chapter l.fm Page 22 Friday, January 18, 2002 8:58 AM 







22 



INTRODUCTION Chapter 1 





Individual die 



Figure 1.9 Finished wafer. Each 
square represents a die - in this case 
the AMD Duron™ microprocessor 
(Reprinted with permission from AMD). 



The impact of the fixed cost is more pronounced for small-volume products. This also 
explains why it makes sense to have large design team working for a number of years on a 
hugely successful product such as a microprocessor. 

While the cost of producing a single transistor has dropped exponentially over the 
past decades, the basic variable-cost equation has not changed: 



variable cost = 



cost of die + cost of die test + cost of packaging 
final test yield 



( 1 . 2 ) 



As will be elaborated on in Chapter 2, the IC manufacturing process groups a number of 
identical circuits onto a single wafer (Figure 1.9). Upon completion of the fabrication, the 
wafer is chopped into dies , which are then individually packaged after being tested. We 
will focus on the cost of the dies in this discussion. The cost of packaging and test is the 
topic of later chapters. 

The die cost depends upon the number of good die on a wafer, and the percentage of 
those that are functional. The latter factor is called the die yield. 



cost of die = 



cost of wafer 

dies per wafer x die yield 



(1.3) 



The number of dies per wafer is, in essence, the area of the wafer divided by the die 
area. The actual situation is somewhat more complicated as wafers are round, and chips are 
square. Dies around the perimeter of the wafer are therefore lost. The size of the wafer has 
been steadily increasing over the years, yielding more dies per fabrication run. Eq. (1.3) 
also presents the first indication that the cost of a circuit is dependent upon the chip 
area — increasing the chip area simply means that less dies fit on a wafer. 

The actual relation between cost and area is more complex, and depends upon the 
die yield. Both the substrate material and the manufacturing process introduce faults that 
can cause a chip to fail. Assuming that the defects are randomly distributed over the wafer, 
and that the yield is inversely proportional to the complexity of the fabrication process, we 
obtain the following expression of the die yield: 
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die yield 




defects per unit area x die area 
a 



,-a 



(1.4) 



a is a parameter that depends upon the complexity of the manufacturing process, and is 
roughly proportional to the number of masks, a = 3 is a good estimate for today’s complex 
CMOS processes. The defects per unit area is a measure of the material and process 
induced faults. A value between 0.5 and 1 defects/cm 2 is typical these days, but depends 
strongly upon the maturity of the process. 



Example 1.3 Die Yield 

Assume a wafer size of 12 inch, a die size of 2.5 cm 2 , 1 defects/cm 2 , and a = 3. Determine the 
die yield of this CMOS process run. 

The number of dies per wafer can be estimated with the following expression, which 
takes into account the lost dies around the perimeter of the wafer. 

2 

,. r 7t x (wafer diameter/2) - n X wafer diameter 

dies per water = 1 ; - — — 

die area Jj x die area 

This means 252 (= 296 - 44) potentially operational dies for this particular example. The die 
yield can be computed with the aid of Eq. (1.4), and equals 16% ! This means that on the aver- 
age only 40 of the dies will be fully functional. 



The bottom line is that the number of functional of dies per wafer, and hence the 
cost per die is a strong function of the die area. While the yield tends to be excellent for the 
smaller designs, it drops rapidly once a certain threshold is exceeded. Bearing in mind the 
equations derived above and the typical parameter values, we can conclude that die costs 
are proportional to the fourth power of the area: 

4 

cost of die = /(die area) (1.5) 

The area is a function that is directly controllable by the designer(s), and is the prime met- 
ric for cost. Small area is hence a desirable property for a digital gate. The smaller the 
gate, the higher the integration density and the smaller the die size. Smaller gates further- 
more tend to be faster and consume less energy, as the total gate capacitance — which is 
one of the dominant performance parameters — often scales with the area. 

The number of transistors in a gate is indicative for the expected implementation 
area. Other parameters may have an impact, though. For instance, a complex interconnect 
pattern between the transistors can cause the wiring area to dominate. The gate complex- 
ity, as expressed by the number of transistors and the regularity of the interconnect struc- 
ture, also has an impact on the design cost. Complex structures are harder to implement 
and tend to take more of the designers valuable time. Simplicity and regularity is a pre- 
cious property in cost-sensitive designs. 

1.3.2 Functionality and Robustness 

A prime requirement for a digital circuit is, obviously, that it performs the function it is 
designed for. The measured behavior of a manufactured circuit normally deviates from the 
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expected response. One reason for this aberration are the variations in the manufacturing 
process. The dimensions, threshold voltages, and currents of an MOS transistor vary 
between runs or even on a single wafer or die. The electrical behavior of a circuit can be 
profoundly affected by those variations. The presence of disturbing noise sources on or off 
the chip is another source of deviations in circuit response. The word noise in the context 
of digital circuits means “unwanted variations of voltages and currents at the logic 
nodes.” Noise signals can enter a circuit in many ways. Some examples of digital noise 
sources are depicted in Figure 1.10. For instance, two wires placed side by side in an inte- 
grated circuit form a coupling capacitor and a mutual inductance. Hence, a voltage or cur- 
rent change on one of the wires can influence the signals on the neighboring wire. Noise 
on the power and ground rails of a gate also influences the signal levels in the gate. 

Most noise in a digital system is internally generated, and the noise value is propor- 
tional to the signal swing. Capacitive and inductive cross talk, and the internally-generated 
power supply noise are examples of such. Other noise sources such as input power supply 
noise are external to the system, and their value is not related to the signal levels. For these 
sources, the noise level is directly expressed in Volt or Ampere. Noise sources that are a 
function of the signal level are better expressed as a fraction or percentage of the signal 
level. Noise is a major concern in the engineering of digital circuits. How to cope with all 
these disturbances is one of the main challenges in the design of high-performance digital 
circuits and is a recurring topic in this book. 






(a) Inductive coupling (b) Capacitive coupling 

Figure 1.10 Noise sources in digital circuits. 



(c) Power and ground 
noise 



The steady-state parameters (also called the static behavior ) of a gate measure how 
robust the circuit is with respect to both variations in the manufacturing process and noise 
disturbances. The definition and derivation of these parameters requires a prior under- 
standing of how digital signals are represented in the world of electronic circuits. 

Digital circuits (DC) perform operations on logical (or Boolean ) variables. A logical 
variable x can only assume two discrete values: 

re {0,1} 

As an example, the inversion (i.e., the function that an inverter performs) implements the 
following compositional relationship between two Boolean variables x and y: 



y = x: {x = 0 => y = 1; x = 1 =» y = 0} 










( 1 . 6 ) 
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A logical variable is, however, a mathematical abstraction. In a physical implemen- 
tation, such a variable is represented by an electrical quantity. This is most often a node 
voltage that is not discrete but can adopt a continuous range of values. This electrical volt- 
age is turned into a discrete variable by associating a nominal voltage level with each logic 
state: 1 <=> V OH , 0 <=> V 0L , where V OH and V OL represent the high and the low logic levels, 
respectively. Applying V 0H to the input of an inverter yields V ol at the output and vice 
versa. The difference between the two is called the logic or signal swing V sw . 



Voh = (Vol) 

Vol = (V 

The Voltage-Transfer Characteristic 



(1.7) 



Assume now that a logical variable in serves as the input to an inverting gate that produces 
the variable out. The electrical function of a gate is best expressed by its voltage-transfer 
characteristic (VTC) (sometimes called the DC transfer characteristic ), which plots the 
output voltage as a function of the input voltage V out = f(V in ). An example of an inverter 
VTC is shown in Figure 1.11. The high and low nominal voltages, V OH and V OL , can 
readily be identified — V OH = f(V 0L ) and V 0L = f(V OH ). Another point of interest of the 
VTC is the gate or switching threshold voltage V M (not to be confused with the threshold 
voltage of a transistor), that is defined as V M = f(V M ). V M can also be found graphically at 
the intersection of the VTC curve and the line given by V out = V m . The gate threshold volt- 
age presents the midpoint of the switching characteristics, which is obtained when the out- 
put of a gate is short-circuited to the input. This point will prove to be of particular interest 
when studying circuits with feedback (also called sequential circuits). 






Figure 1.11 Inverter voltage-transfer 
characteristic. 



Even if an ideal nominal value is applied at the input of a gate, the output signal 
often deviates from the expected nominal value. These deviations can be caused by noise 
or by the loading on the output of the gate (i.e., by the number of gates connected to the 
output signal). Figure 1.12a illustrates how a logic level is represented in reality by a range 
of acceptable voltages, separated by a region of uncertainty, rather than by nominal levels 
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alone. The regions of acceptable high and low voltages are delimited by the VlH and VlL 
voltage levels, respectively. These represent by definition the points where the gain 
(= dV out / dV jn ) of the VTC equals -1 as shown in Figure 1.12b. The region between V IH 
and V 1L is called the undefined region (sometimes also referred to as transition width, or 
TW). Steady-state signals should avoid this region if proper circuit operation is to be 
ensured. 

Noise Margins 

For a gate to be robust and insensitive to noise disturbances, it is essential that the “0” and 
“1” intervals be as large as possible. A measure of the sensitivity of a gate to noise is given 
by the noise margins NM L ( noise margin low) and NM H ( noise margin high), which quan- 
tize the size of the legal “0” and “1”, respectively, and set a fixed maximum threshold on 
the noise value: 



nm l = V 1L - V OL 
nm h = V OH -V IH 



(1.8) 



The noise margins represent the levels of noise that can be sustained when gates are cas- 
caded as illustrated in Figure 1.13. It is obvious that the margins should be larger than 0 
for a digital circuit to be functional and by preference should be as large as possible. 



Regenerative Property 



A large noise margin is a desirable, but not sufficient requirement. Assume that a signal is 
disturbed by noise and differs from the nominal voltage levels. As long as the signal is 
within the noise margins, the following gate continues to function correctly, although its 
output voltage varies from the nominal one. This deviation is added to the noise injected at 
the output node and passed to the next gate. The effect of different noise sources may 
accumulate and eventually force a signal level into the undefined region. This, fortunately, 
does not happen if the gate possesses the regenerative property, which ensures that a dis- 




Undefined 

Region 





(a) Relationship between voltage and logic levels 
Figure 1.12 Mapping logic levels to the voltage domain. 



(b) Definition of V IH and V IL 




chapter l.fm Page 27 Friday, January 18, 2002 8:58 AM 






Section 1.3 Quality Metrics of a Digital Design 



27 




V OH 
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Gate output Gate input 

Stage M Stage /W+1 




Figure 1.13 Cascaded inverter gates: 
definition of noise margins. 



turbed signal gradually converges back to one of the nominal voltage levels after passing 
through a number of logical stages. This property can be understood as follows: 

An input voltage v in (v in e “0”) is applied to a chain of N inverters (Figure 1.14a). 
Assuming that the number of inverters in the chain is even, the output voltage v out (N — » 
°o) will equal V OL if and only if the inverter possesses the regenerative property. Similarly, 
when an input voltage v in (v in e “1”) is applied to the inverter chain, the output voltage 
will approach the nominal value V OH . 




(a) A chain of inverters 




(b) Simulated response of 
chain of MOS inverters 



Figure 1.14 The regenerative property. 



Example 1.4 Regenerative property 

The concept of regeneration is illustrated in Figure 1.14b, which plots the simulated transient 
response of a chain of CMOS inverters. The input signal to the chain is a step-waveform with 









28 INTRODUCTION Chapter 1 



a degraded amplitude, which could be caused by noise. Instead of swinging from rail to rail, 
v 0 only extends between 2. 1 and 2.9 V. From the simulation, it can be observed that this devi- 
ation rapidly disappears, while progressing through the chain; v, , for instance, extends from 
0.6 V to 4.45 V. Even further, v 2 already swings between the nominal V 0L and V OH . The 
inverter used in this example clearly possesses the regenerative property. 

The conditions under which a gate is regenerative can be intuitively derived by ana- 
lyzing a simple case study. Figure 1.15(a) plots the VTC of an inverter V out = ft V in ) as well 
as its inverse function fmv(), which reverts the function of the x- and v-axis and is defined 
as follows: 

in = f(out) => in = finv(out) (1.9) 




(a) Regenerative gate (b) Nonregenerative gate 

Figure 1.15 Conditions for regeneration. 



Assume that a voltage v 0 , deviating from the nominal voltages, is applied to the first 
inverter in the chain. The output voltage of this inverter equals v 7 = f( v 0 ) and is applied to 
the next inverter. Graphically this corresponds to v, = finv(v 2 ). The signal voltage gradu- 
ally converges to the nominal signal after a number of inverter stages, as indicated by the 
arrows. In Figure 1.15(b) the signal does not converge to any of the nominal voltage levels 
but to an intermediate voltage level. Hence, the characteristic is nonregenerative. The dif- 
ference between the two cases is due to the gain characteristics of the gates. To be regener- 
ative, the VTC should have a transient region (or undefined region) with a gain greater 
than 1 in absolute value, bordered by the two legal zones, where the gain should be 
smaller than 1. Such a gate has two stable operating points. This clarifies the definition of 
the V IH and the V IL levels that form the boundaries between the legal and the transient 
zones. 

Noise Immunity 

While the noise margin is a meaningful means for measuring the robustness of a circuit 
against noise, it is not sufficient. It expresses the capability of a circuit to “overpower” a 
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noise source. Noise immunity, on the other hand, expresses the ability of the system to pro- 
cess and transmit information correctly in the presence of noise [Dally98] . Many digital 
circuits with low noise margins have very good noise immunity because they reject a 
noise source rather than overpower it. These circuits have the property that only a small 
fraction of a potentially-damaging noise source is coupled to the important circuit nodes. 
More precisely, the transfer function between noise source and signal node is far smaller 
than 1. Circuits that do not posses this property are susceptible to noise. 

To study the noise immunity of a gate, we have to construct a noise budget that allo- 
cates the power budget to the various noise sources. As discussed earlier, the noise sources 
can be divided into sources that are 

• proportional to the signal swing V sw . The impact on the signal node is expressed as g 
V 

r SW' 

• fixed. The impact on the signal node equals/ V/y, with V/, the amplitude of the noise 
source, and /the transfer function from noise to signal node. 

We assume, for the sake of simplicity, that the noise margin equals half the signal swing 
(for both H and L). To operate correctly, the noise margin has to be larger than the sum of 
the coupled noise values. 

= Y^>^ fi v Nfi+ Y^8jV sw ( 1 . 10 ) 

i j 

we can derive the minimum signal swing necessary for the 



Vnm 

Given a set of noise sources, 
system to be operational. 






Nfi 



V > ■ 



( 1 . 11 ) 



1 - 2 ' 



This makes it clear that the signal swing (and the noise margin) has to be large enough to 
overpower the impact of the fixed sources (f V N j). On the other hand, the sensitivity to 
internal sources depends primarily upon the noise suppressing capabilities of the gate, this 
is the proportionality or gain factors gj. In the presence of large gain factors, increasing the 
signal swing does not do any good to suppress noise, as the noise increases proportionally. 
In later chapters, we will discuss some differential logic families that suppress most of the 
internal noise, and hence can get away with very small noise margins and signal swings. 



Directivity 

The directivity property requires a gate to be unidirectional , that is, changes in an output 
level should not appear at any unchanging input of the same circuit. If not, an output-sig- 
nal transition reflects to the gate inputs as a noise signal, affecting the signal integrity. 

In real gate implementations, full directivity can never be achieved. Some feedback 
of changes in output levels to the inputs cannot be avoided. Capacitive coupling between 
inputs and outputs is a typical example of such a feedback. It is important to minimize 
these changes so that they do not affect the logic levels of the input signals. 





* 
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Fan-In and Fan-Out 

The fan-out denotes the number of load gates N that are connected to the output of the 
driving gate (Figure 1.16). Increasing the fan-out of a gate can affect its logic output lev- 
els. From the world of analog amplifiers, we know that this effect is minimized by making 
the input resistance of the load gates as large as possible (minimizing the input currents) 
and by keeping the output resistance of the driving gate small (reducing the effects of load 
currents on the output voltage). When the fan-out is large, the added load can deteriorate 
the dynamic performance of the driving gate. For these reasons, many generic and library 
components define a maximum fan-out to guarantee that the static and dynamic perfor- 
mance of the element meet specification. 

The fan-in of a gate is defined as the number of inputs to the gate (Figure 1.16b). 
Gates with large fan-in tend to be more complex, which often results in inferior static and 
dynamic properties. 





(a) Fan-out N 



M ) 



(b) Fan-in M 




Figure 1.16 Definition of fan-out and fan- 
in of a digital gate. 



The Ideal Digital Gate 

Based on the above observations, we can define the ideal digital gate from a static per- 
spective. The ideal inverter model is important because it gives us a metric by which we 
can judge the quality of actual implementations. 

Its VTC is shown in Figure 1.17 and has the following properties: infinite gain in the 
transition region, and gate threshold located in the middle of the logic swing, with high 
and low noise margins equal to half the swing. The input and output impedances of the 
ideal gate are infinity and zero, respectively (i.e., the gate has unlimited fan-out). While 
this ideal VTC is unfortunately impossible in real designs, some implementations, such as 
the static CMOS inverter, come close. 



Example 1.5 Voltage-Transfer Characteristic 

Figure 1.18 shows an example of a voltage-transfer characteristic of an actual, but outdated 
gate structure (as produced by SPICE in the DC analysis mode). The values of the dc-param- 
eters are derived from inspection of the graph. 
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Vou, 



+• 

Vin Figure 1 .1 7 Ideal voltage-transfer characteristic. 






V 0 „ = 3.5V; V 0L = 0.45 V 
V lH = 2.35 V; V IL = 0.66 V 
V M = 1.64 V 

7VM H = 1.15 V; NM l = 0.21V 

The observed transfer characteristic, obviously, is far from ideal: it is asymmetrical, 
has a very low value for NM L , and the voltage swing of 3.05 V is substantially below the max- 
imum obtainable value of 5 V (which is the value of the supply voltage for this design). 



A- 




Figurel.18 Voltage-transfer 
characteristic of an NMOS 
inverter of the 1970s. 



1.3.3 Performance 

From a system designers perspective, the performance of a digital circuit expresses the 
computational load that the circuit can manage. For instance, a microprocessor is often 
characterized by the number of instructions it can execute per second. This performance 
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metric depends both on the architecture of the processor — for instance, the number of 
instructions it can execute in parallel — , and the actual design of logic circuitry. While the 
former is crucially important, it is not the focus of this text book. We refer the reader to the 
many excellent books on this topic [for instance, Hennessy96]. When focusing on the pure 
design, performance is most often expressed by the duration of the clock period ( clock 
cycle time), or its rate (clock frequency). The minimum value of the clock period for a 
given technology and design is set by a number of factors such as the time it takes for the 
signals to propagate through the logic, the time it takes to get the data in and out of the 
registers, and the uncertainty of the clock arrival times. Each of these topics will be dis- 
cussed in detail on the course of this text book. At the core of the whole performance anal- 
ysis, however, lays the performance of an individual gate. 

The propagation delay t p of a gate defines how quickly it responds to a change at its 
input(s). It expresses the delay experienced by a signal when passing through a gate. It is 
measured between the 50% transition points of the input and output waveforms, as shown 
in Figure 1.19 for an inverting gate. 2 Because a gate displays different response times for 
rising or falling input waveforms, two definitions of the propagation delay are necessary. 
The t pLH defines the response time of the gate for a low to high (or positive) output transi- 
tion, while t pHL refers to a high to low (or negative) transition. The propagation delay t p is 
defined as the average of the two. 



, _ f pLH + f ; >HL 

l P ~ O 



( 1 . 12 ) 




Figure 1.19 Definition of propagation 
delays and rise and fall times. 



2 The 50% definition is inspired the assumption that the switching threshold V M is typically located in the 
middle of the logic swing. 
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CAUTION: : Observe that the propagation delay t p , in contrast to t pLH and t pHL , is an 
artificial gate quality metric, and has no physical meaning per se. It is mostly used to com- 
pare different semiconductor technologies, or logic design styles. 

The propagation delay is not only a function of the circuit technology and topology, 
but depends upon other factors as well. Most importantly, the delay is a function of the 
slopes of the input and output signals of the gate. To quantify these properties, we intro- 
duce the rise and fall times t r and tp which are metrics that apply to individual signal 
waveforms rather than gates (Figure 1.19), and express how fast a signal transits between 
the different levels. The uncertainty over when a transition actually starts or ends is 
avoided by defining the rise and fall times between the 10% and 90% points of the wave- 
forms, as shown in the Figure. The rise/fall time of a signal is largely determined by the 
strength of the driving gate, and the load presented by the node itself, which sums the con- 
tributions of the connecting gates (fan-out) and the wiring parasitics. 

When comparing the performance of gates implemented in different technologies or 
circuit styles, it is important not to confuse the picture by including parameters such as 
load factors, fan-in and fan-out. A uniform way of measuring the t p of a gate, so that tech- 
nologies can be judged on an equal footing, is desirable. The de-facto standard circuit for 
delay measurement is the ring oscillator , which consists of an odd number of inverters 
connected in a circular chain (Figure 1.20). Due to the odd number of inversions, this cir- 
cuit does not have a stable operating point and oscillates. The period T of the oscillation is 
determined by the propagation time of a signal transition through the complete chain, or 
T = 2 x t p x N with N the number of inverters in the chain. The factor 2 results from the 
observation that a full cycle requires both a low-to-high and a high-to-low transition. Note 
that this equation is only valid for 2Nt p >> f + t r If this condition is not met, the circuit 
might not oscillate — one “wave” of signals propagating through the ring will overlap with 
a successor and eventually dampen the oscillation. Typically, a ring oscillator needs a 
least five stages to be operational. 




Figure 1 .20 Ring oscillator circuit for propagation-delay measurement. 
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CAUTION: We must be extremely careful with results obtained from ring oscillator 
measurements. A t p of 20 psec by no means implies that a circuit built with those gates 
will operate at 50 GHz. The oscillator results are primarily useful for quantifying the dif- 
ferences between various manufacturing technologies and gate topologies. The oscillator 
is an idealized circuit where each gate has a fan-in and fan-out of exactly one and parasitic 
loads are minimal. In more realistic digital circuits, fan-ins and fan-outs are higher, and 
interconnect delays are non-negligible. The gate functionality is also substantially more 
complex than a simple invert operation. As a result, the achievable clock frequency on 
average is 50 to a 100 times slower than the frequency predicted from ring oscillator mea- 
surements. This is an average observation; carefully optimized designs might approach the 
ideal frequency more closely. 



Example 1.6 Propagation Delay of First-Order RC Network 

Digital circuits are often modeled as first-order RC networks of the type shown in Figure 
1.21. The propagation delay of such a network is thus of considerable interest. 



R 




Figure 1.21 First-order RC network. 



When applying a step input (with v in going from 0 to V), the transient response of this 
circuit is known to be an exponential function, and is given by the following expression 
(where x = RC, the time constant of the network): 

v 0 Jt) = (1 - e- ,rt ) V (1.13) 

The time to reach the 50% point is easily computed as t = ln(2)x = 0.69x. Similarly, it takes t 
= ln(9)x = 2.2 t to get to the 90% point. It is worth memorizing these numbers, as they are 
extensively used in the rest of the text. 



1.3.4 Power and Energy Consumption 

The power consumption of a design determines how much energy is consumed per opera- 
tion, and much heat the circuit dissipates. These factors influence a great number of criti- 
cal design decisions, such as the power-supply capacity, the battery lifetime, supply-line 
sizing, packaging and cooling requirements. Therefore, power dissipation is an important 
property of a design that affects feasibility, cost, and reliability. In the world of high-per- 
formance computing, power consumption limits, dictated by the chip package and the heat 
removal system, determine the number of circuits that can be integrated onto a single chip, 
and how fast they are allowed to switch.With the increasing popularity of mobile and dis- 
tributed computation, energy limitations put a firm restriction on the number of computa- 
tions that can be performed given a minimum time between battery recharges. 
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Depending upon the design problem at hand, different dissipation measures have to 
be considered. For instance, the peak power P peak is important when studying supply-line 
sizing. When addressing cooling or battery requirements, one is predominantly interested 
in the average power dissipation P av Both measures are defined in equation Eq. (1.14): 



P = 

av 



P peak = 'peakV supply = »iax[p(t)] 
T T 

1 



= \\pWt = V ^f h \i S upp,y(t)dt 



(1.14) 






where pit) is the instantaneous power, i supp i y is the current being drawn from the supply 
voltage V supply over the interval t e [0,7], and i k is the maximum value of i supp i y over that 
interval. 

The dissipation can further be decomposed into static and dynamic components. The 
latter occurs only during transients, when the gate is switching. It is attributed to the 
charging of capacitors and temporary current paths between the supply rails, and is, there- 
fore, proportional to the switching frequency: the higher the number of switching events, 
the higher the dynamic power consumption. The static component on the other hand is 
present even when no switching occurs and is caused by static conductive paths between 
the supply rails or by leakage currents. It is always present, even when the circuit is in 
stand-by. Minimization of this consumption source is a worthwhile goal. 

The propagation delay and the power consumption of a gate are related — the propa- 
gation delay is mostly determined by the speed at which a given amount of energy can be 
stored on the gate capacitors. The faster the energy transfer (or the higher the power con- 
sumption), the faster the gate. For a given technology and gate topology, the product of 
power consumption and propagation delay is generally a constant. This product is called 
the power-delay product (or PDP) and can be considered as a quality measure for a 
switching device. The PDP is simply the energy consumed by the gate per switching 
event. The ring oscillator is again the circuit of choice for measuring the PDP of a logic 
family. 

An ideal gate is one that is fast, and consumes little energy. The energy-delay prod- 
uct (E-D) is a combined metric that brings those two elements together, and is often used 
as the ultimate quality metric. From the above, it should be clear that the E-D is equivalent 
to power-delay 1 . 

Example 1.7 Energy Dissipation of First-Order RC Network 

Let us consider again the first-order RC network shown in Figure 1.21. When applying a step 
input (with V in going from 0 to V), an amount of energy is provided by the signal source to the 
network. The total energy delivered by the source (from the start of the transition to the end) 
can be readily computed: 



E:„ = 



= \i in {t)v in (t)dt = V\c d -^dt = {CV)\dv ou , = CV 2 (1.15) 



It is interesting to observe that the energy needed to charge a capacitor from 0 to V volt 
with a step input is a function of the size of the voltage step and the capacitance, but is inde- 
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pendent of the value of the resistor. We can also compute how much of the delivered energy 
gets stored on the capacitor at the end of the transition. 

°° v 2 

E c = \i c (t)v oul (t)dt = ^C d -^v out dt = cfv out dv our = (1.16) 

0 0 0 

This is exactly half of the energy delivered by the source. For those who wonder hap- 
pened with the other half — a simple analysis shows that an equivalent amount gets dissipated 
as heat in the resistor during the transaction. We leave it to the reader to demonstrate that dur- 
ing the discharge phase (for a step from V to 0), the energy originally stored on the capacitor 
gets dissipated in the resistor as well, and turned into heat. 




1.4 Summary 

In this introductory chapter, we learned about the history and the trends in digital circuit 
design. We also introduced the important quality metrics, used to evaluate the quality of a 
design: cost, functionality, robustness, performance, and energy/power dissipation. At the 
end of the Chapter, you can find an extensive list of reference works that may help you to 
learn more about some of the topics introduced in the course of the text. 



1.5 To Probe Further 

The design of digital integrated circuits has been the topic of a multitude of textbooks and 
monographs. To help the reader find more information on some selected topics, an exten- 
sive list of reference works is listed below. The state-of-the-art developments in the area 
of digital design are generally reported in technical journals or conference proceedings, 
the most important of which are listed. 



JOURNALS AND PROCEEDINGS 

IEEE Journal of Solid-State Circuits 
IEICE Transactions on Electronics (Japan) 

Proceedings of The International Solid-State and Circuits Conference (ISSCC) 

Proceedings of the VLSI Circuits Symposium 

Proceedings of the Custom Integrated Circuits Conference (CICC) 

European Solid-State Circuits Conference (ESSCIRC) 
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1.6 Exercises 

1. [E, None, 1.2] Based on the evolutionary trends described in the chapter, predict the integra- 
tion complexity and the clock speed of a microprocessor in the year 2015. Determine also 
how much DRAM should be available on a single chip at that point in time, if Moore’s law 
would still hold. 

2. [D, None, 1.2] Visit the Intel on-line microprocessor museum 
{http://www.intel.com/intel/intelis/museum/exhibit/hist_micro/index.htm). While browsing 
through the microprocessor hall-of-fame, determine the rate of increase in transistor counts 
and clock frequencies in the 70' s, 80’s, and 90’s. Also, create a plot of the number of transis- 
tors versus technology feature size. Spend some time browsing the site. It contains a large 
amount of very interesting information. 

3. [D, None, 1.2] By scanning the literature, find the leading-edge devices at this point in time in 
the following domains: microprocessor, signal processor, SRAM, and DRAM. Determine for 
each of those, the number of integrated devices, the overall area and the maximum clock 
speed. Evaluate the match with the trends predicted in section 1.2. 

4. [D, None, 1.2] Find in the library the latest November issue of the Journal of Solid State Cir- 
cuits. For each of the papers, determine its application class (such as microprocessor, signal 
processor, DRAM, SRAM), the type of manufacturing technology used (MOS, bipolar, etc.), 
the minimum feature size, the number of devices on a single die, and the maximum clock 
speed. Tabulate the results along the various application classes. 

5. [E, None, 1.2] Provide at least three examples for each of the abstraction levels described in 
Figure 1.6. 
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Exercises 

1. [E, None, 1.2] Based on the evolutionary trends described in the chapter, predict the integra- 
tion complexity and the clock speed of a microprocessor in the year 2015. Determine also 
how much DRAM should be available on a single chip at that point in time, if Moore’s law 
would still hold. 

2. [D, None, 1.2J Visit the Intel on-line microprocessor museum 
(http://www.intel.com/intel/inteIis/museum/exhibit/hist_micro/index.htm). While browsing 
through the microprocessor hall-of-fame, determine the rate of increase in transistor counts 
and clock frequencies in the 70’s, 80’s, and 90’s. Also, create a plot of the number of transis- 
tors versus technology feature size. Spend some time browsing the site. It contains a large 
amount of very interesting information. 

3. [D, None, 1.2J By scanning the literature, find the leading-edge devices at this point in time in 
the following domains: microprocessor, signal processor, SRAM, and DRAM. Determine for 
each of those, the number of integrated devices, the overall area and the maximum clock 
speed. Evaluate the match with the trends predicted in section 1 .2. 

4 . [D, None, 1.2J Find in the library the latest November issue of the Journal of Solid State Cir- 
cuits. For each of the papers, determine its application class (such as microprocessor, signal 
processor, DRAM, SRAM), the type of manufacturing technology used (MOS, bipolar, etc.), 
the minimum feature size, the number of devices on a single die, and the maximum clock 
speed. Tabulate the results along the various application classes. 

5 . [E, None, 1.2J Provide at least three examples for each of the abstraction levels described in 
Figure 1.6. 



More to come in the very near future! 
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3.1 Introduction 

It is a well-known premise in engineering that the conception of a complex construction 
without a prior understanding of the underlying building blocks is a sure road to failure. 
This surely holds for digital circuit design as well. The basic building blocks in today’s 
digital circuits are the silicon semiconductor devices, more specifically the MOS transis- 
tors and to a lesser degree the parasitic diodes, and the interconnect wires. The role of the 
semiconductor devices has been appreciated for a long time in the world of digital inte- 
grated circuits. On the other hand, interconnect wires have only recently started to play a 
dominant role as a result of the advanced scaling of the semiconductor technology. 

Giving the reader the necessary knowledge and understanding of these components 
is the prime motivation for the next two chapters. It is not our intention to present an in- 
depth treatment of the physics of semiconductor devices and interconnect wires. We refer 
the reader to the many excellent textbooks on semiconductor devices for that purpose, 
some of which are referenced in the To Probe Further section at the end of the chapters. 
The goal is rather to describe the functional operation of the devices, to highlight the prop- 
erties and parameters that are particularly important in the design of digital gates, and to 
introduce notational conventions. 

Another important function of this chapter is the introduction of models. Taking all 
the physical aspects of each component into account when designing complex digital cir- 
cuits leads to an unnecessary complexity that quickly becomes intractable. Such an 
approach is similar to considering the molecular structure of concrete when constructing a 
bridge. To deal with this issue, an abstraction of the component behavior called a model is 
typically employed. A range of models can be conceived for each component presenting a 
trade-off between accuracy and complexity. A simple first-order model is useful for man- 
ual analysis. It has limited accuracy but helps us to understand the operation of the circuit 
and its dominant parameters. When more accurate results are needed, complex, second- or 
higher-order models are employed in conjunction with computer-aided simulation. In this 
chapter, we present both first-order models for manual analysis as well as higher-order 
models for simulation for each component of interest. 

Designers tend to take the component parameters offered in the models for granted. 
They should be aware, however, that these are only nominal values, and that the actual 
parameter values vary with operating temperature, over manufacturing runs, or even over 
a single wafer. To highlight this issue, a short discussion on process variations and their 
impact is included in the chapter. 




3.2 The Diode 

Although diodes rarely occur directly in the schematic diagrams of present-day digital 
gates, they are still omnipresent. Each MOS transistor implicitly contains a number of 
reverse-biased diodes that directly influence the behavior of the device. Especially, the 
voltage-dependent capacitances contributed by these parasitic elements play an important 
role in the switching behavior of the MOS digital gate. Diodes are also used to protect the 
input devices of an IC against static charges. Therefore, a brief review of the basic proper- 
ties and device equations of the diode is appropriate. Rather than being comprehensive. 
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we choose to focus on those aspects that prove to be influential in the design of digital 
MOS circuits, this is the operation in reverse-biased mode. 1 



3.2.1 A First Glance at the Diode — The Depletion Region 



The /w -junction diode is the simplest of the semiconductor devices. Figure 3.1a shows a 
cross-section of a typical pn-j unction. It consists of two homogeneous regions of p- and n- 
type material, separated by a region of transition from one type of doping to another, 
which is assumed thin. Such a device is called a step or abrupt junction. The p-type mate- 
rial is doped with acceptor impurities (such as boron), which results in the presence of 
holes as the dominant or majority carriers. Similarly, the doping of silicon with donor 
impurities (such as phosphorus or arsenic) creates an n-type material, where electrons are 
the majority carriers. Aluminum contacts provide access to the p- and u-terminals of the 
device. The circuit symbol of the diode, as used in schematic diagrams, is introduced in 
Figure 3.1c. 

To understand the behavior of the pn-j unction diode, we often resort to a one-dimen- 
sional simplification of the device (Figure 3.1b). Bringing the p- and n-type materials 
together causes a large concentration gradient at the boundary. The electron concentration 
changes from a high value in the n-type material to a very small value in the p-type 
material. The reverse is true for the hole concentration. This gradient causes electrons to 




(a) Cross-section of /?n-junction in an IC process 




(b) One-dimensional 
representation 




(c) Diode symbol 



Figure 3.1 Abrupt pn -junction diode 
and its schematic symbol. 



1 We refer the interested reader to the web-site of the textbook for a comprehensive description of the 
diode operation. 







* 
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diffuse from n to p and holes to diffuse from p to n. When the holes leave the /j-type mate- 
rial, they leave behind immobile acceptor ions, which are negatively charged. Conse- 
quently, the p-type material is negatively charged in the vicinity of the /w- boundary. 
Similarly, a positive charge builds up on the n-side of the boundary as the diffusing elec- 
trons leave behind the positively charged donor ions. The region at the junction, where the 
majority carriers have been removed, leaving the fixed acceptor and donor ions, is called 
the depletion or space-charge region. The charges create an electric field across the 
boundary, directed from the n to the p-region. This field counteracts the diffusion of holes 
and electrons, as it causes electrons to drift from p to n and holes to drift from n to p. 
Under equilibrium, the depletion charge sets up an electric field such that the drift currents 
are equal and opposite to the diffusion currents, resulting in a zero net flow. 

The above analysis is summarized in Figure 3.2 that plots the current directions, the 
charge density, the electrical field, and the electrostatic field of the abrupt /w-junction 
under zero-bias conditions. In the device shown, the p material is more heavily doped than 
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the n, or N A > N D , with N A and N n the acceptor and donor concentrations, respectively. 
Hence, the charge concentration in the depletion region is higher on the p-side of the junc- 
tion. Figure 3.2 also shows that under zero bias, there exists a voltage <|) 0 across the junc- 
tion, called the built-in potential. This potential has the value 



<l>o = <M n 



[N A N D l 



where (Jjy-is the thermal voltage 



(3.1) 



= — = 26mV at 300 K (3.2) 

q 

The quantity n, is the intrinsic carrier concentration in a pure sample of the semiconductor 
and equals approximately 1.5 x 10 10 cm' 3 at 300 K for silicon. 



Example 3.1 Built-in Voltage of pn-junction 

An abrupt junction has doping densities of N A = 10 15 atoms/cm 3 , and N D = 10 16 atoms/cm 3 . 
Calculate the built-in potential at 300 K. 

From Eq. (3.1), 

<>„ = 26 In mV = 638 mV 

v ° L2.25 X 10 20 J 



3.2.2 Static Behavior 
The Ideal Diode Equation 

Assume now that a forward voltage V D is applied to the junction or, in other words, that 
the potential of thep-region is raised with respect to the n-zone. The applied potential low- 
ers the potential barrier. Consequently, the flow of mobile carriers across the junction 
increases as the diffusion current dominates the drift component. These carriers traverse 




Figure 3.3 Minority carrier concentrations in the neutral region near an abrupt pn-j unction under forward-bias 
conditions. 
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the depletion region and are injected into the neutral n- and p-regions, where they become 
minority carriers, as is illustrated in Figure 3.3. Under the assumption that no voltage gra- 
dient exists over the neutral regions, which is approximately the case for most modern 
devices, these minority carriers will diffuse through the region as a result of the concentra- 
tion gradient until they get recombined with a majority carrier. The net result is a current 
flowing through the diode from the p-region to the n- region, and the diode is said to be in 
the forward-bias mode. 

On the other hand, when a reverse voltage V D is applied to the junction or when the 
potential of the p- region is lowered with respect to the n- region, the potential barrier is 
raised. This results in a reduction in the diffusion current, and the drift current becomes 
dominant. A current flows from the n-region to the p-region. Since the number of minority 
carriers in the neutral regions (electrons in the p-zone, holes in the n-region) is very small, 
this drift current component is virtually ignorable (Figure 3.4). It is fair to state that in the 
reverse-bias mode the diode operates as a nonconducting, or blocking, device. The diode 
thus acts as a one-way conductor. 




Figure 3.4 Minority carrier concentration in the neutral regions near the prc-junction under reverse-bias 
conditions. 



The most important property of the diode current is its exponential dependence upon 
the applied bias voltage. This is illustrated in Figure 3.5, which plots the diode current I D 
as a function of the bias voltage V D . The exponential behavior for positive-bias voltages is 
even more apparent in Figure 3.5b, where the current is plotted on a logarithmic scale. The 
current increases by a factor of 10 for every extra 60 mV (= 2.3 <f> r ) of forward bias. At 
small voltage levels ( V D < 0.15 V), a deviation from the exponential dependence can be 
observed, which is due to the recombination of holes and electrons in the depletion region. 

The behavior of the diode for both forward- and reverse bias conditions is best 
described by the well-known ideal diode equation, which relates the current through the 
diode I D to the diode bias voltage V D 

I D = I s (e v ^T-]) (3.3) 

Observe how Eq. (3.3) corresponds to the exponential behavior plotted in Figure 3.5. § T is 
the thermal voltage of Eq. (3.2) and is equal to 26 mV at room temperature. 

I s represents a constant value, called the saturation current of the diode. It is propor- 
tional to the area of the diode, and a function of the doping levels and widths of the neutral 






* 
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(a) On a linear scale (b) On a logarithmic scale (forward bias only) 

Figure 3.5 Diode current as a function of the bias voltage V D - 

regions. Most often, I s is determined empirically.lt is worth mentioning that in actual 
devices, the reverse currents are substantially larger than the saturation current I s . This is 
due to the thermal generation of hole and electron pairs in the depletion region. The elec- 
tric field present sweeps these carriers out of the region, causing an additional current 
component. For typical silicon junctions, the saturation current is nominally in the range 
of 10 -17 A/pm 2 , while the actual reverse currents are approximately three orders of magni- 
tude higher. Actual device measurements are, therefore, necessary to determine realistic 
values for the reverse diode leakage currents. 

Models for Manual Analysis 

The derived current-voltage equations can be summarized in a set of simple models that 
are useful in the manual analysis of diode circuits. A first model, shown in Figure 3.6a, is 
based on the ideal diode equation Eq. (3.3). While this model yields accurate results, it has 
the disadvantage of being strongly nonlinear. This prohibits a fast, first-order analysis of 
the dc-operation conditions of a network. An often-used, simplified model is derived by 
inspecting the diode current plot of Figure 3.5. For a “fully conducting” diode, the voltage 
drop over the diode V D lies in a narrow range, approximately between 0.6 and 0.8 V. To a 




(a) Ideal diode model 



(b) First-order diode model 









Figure 3.6 Diode models. 
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first degree, it is reasonable to assume that a conducting diode has a fixed voltage drop 
V Don over it. Although the value of V Don depends upon I s , a value of 0.7 V is typically 
assumed. This gives rise to the model of Figure 3.6b, where a conducting diode is replaced 
by a fixed voltage source. 

Example 3.2 Analysis of Diode Network 

Consider the simple network of Figure 3.7 and assume that V s = 3 Y, R s = 10 kfi and l s = 0.5 
X 10 -16 A. The diode current and voltage are related by the following network equation 

v s -r s i d = v d 

Inserting the ideal diode equation and (painfully) solving the nonlinear equation using either 
numerical or iterative techniques yields the following solution: I D = 0.224 mA, and V D = 
0.757 V. The simplified model with V Don = 0.7 V produces similar results (V D = 0.7 V, I D - 
0.23 A) with far less effort. It hence makes considerable sense to use this model when deter- 
mining a first-order solution of a diode network. 




3.2.3 Dynamic, or Transient, Behavior 

So far, we have mostly been concerned with the static, or steady-state, characteristics of 
the diode. Just as important in the design of digital circuits is the response of the device to 
changes in its bias conditions. The transient, or dynamic, response determines the maxi- 
mum speed at which the device can be operated. Because the operation mode of the diode 
is a function of the amount of charge present in both the neutral and the space-charge 
regions, its dynamic behavior is strongly determined by how fast charge can be moved 
around. 

While we could embark at this point onto an in-depth analysis of the switching 
behavior of the diode in the forward-biasing mode, it is our conviction that this would be 
besides the point and unnecessarily complicate the discussion. In fact, all diodes in an 
operational MOS digital integrated circuit are reverse-biased and are supposed to remain 
so under all circumstances. Only under exceptional conditions may forward-biasing occur. 
A signal over(under) shooting the supply rail is an example of such. Due to its detrimental 
impact on the overall circuit operation, this should be avoided under all circumstances. 

Hence, we will devote our attention solely to what governs the dynamic response of 
the diode under reverse-biasing conditions, the depletion-region charge. 







* 
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Depletion-Region Capacitance 

In the ideal model, the depletion region is void of mobile carriers, and its charge is deter- 
mined by the immobile donor and acceptor ions. The corresponding charge distribution 
under zero-bias conditions was plotted in Figure 3.2. This picture can be easily extended 
to incorporate the effects of biasing. At an intuitive level the following observations can 
be easily verified — under forward-bias conditions, the potential barrier is reduced, which 
means that less space charge is needed to produce the potential difference. This corre- 
sponds to a reduced depletion-region width. On the other hand, under reverse conditions, 
the potential barrier is increased corresponding to an increased space charge and a wider 
depletion region. These observations are confirmed by the well- known depletion-region 
expressions given below (a derivation of these expressions, which are valid for abrupt 
junctions, is either simple or can be found in any textbook on devices such as [Howe97]). 
One observation is crucial — due to the global charge neutrality requirement of the diode, 
the total acceptor and donor charges must be numerically equal. 

1. Depletion-region charge (V D is positive for forward bias). 



Qj = A o 




n a n d 
n a + n d 




-V D ) 



2. Depletion-region width. 



Wj = W 2 -W l 
3. Maximum electric field. 



l(2e si N A + N d 

A q n a n d 




o~V D ) 



E J 




o ~V D ) 



(3.4) 



(3.5) 



(3.6) 



In the preceding equations E si stands for the electrical permittivity of silicon and equals 1 1.7 
times the permittivity of a vacuum, or 1.053 x 1CT 10 F/m. The ratio of the n- versus p-side 
of the depletion-region width is determined by the doping-level ratios: W 2 I{— Wj) = N A /N D . 

From an abstract point of view, it is possible to visualize the depletion region as a 
capacitance, albeit one with very special characteristics. Because the space-charge region 
contains few mobile carriers, it acts as an insulator with a dielectric constant £ si of the 
semiconductor material. The n- and p-regions act as the capacitor plates. A small change 
in the voltage applied to the junction dV D causes a change in the space charge dQj. Hence, 
a depletion-layer capacitance can be defined 



C, 



£ 9l 

dV r 



N a N d 
2 N a + N, 



Ofro-^)- 



Sjo. 



V 1 - V D /<P 0 



(3.7) 



where Cj 0 is the capacitance under zero-bias conditions and is only a function of the phys- 
ical parameters of the device. 















chapter3.fm Page 86 Friday, January 18, 2002 9:00 AM 




86 



THE DEVICES Chapter 3 




Figure 3.8 Junction capacitance (in fF/pm 2 ) as a function of the applied bias voltage. 






C J0 



- a d 



fas# n aN d ' 

fa 2 N a + N d 




(3.8) 



Notice that the same capacitance value is obtained when using the standard parallel-plate 
capacitor equation C ; = £ SJ - A r /Wj (with VV' ; given in Eq. (3.5)). Typically, the A D factor is 
omitted, and Cj and C ;TI are expressed as a capacitance/unit area. 

The resulting junction capacitance is plotted in the function of the bias voltage in 
Figure 3.8 for a typical silicon diode found in MOS circuits. A strong nonlinear depen- 
dence can be observed. Note also that the capacitance decreases with an increasing reverse 
bias: a reverse bias of 5 V reduces the capacitance by more than a factor of two. 



Example 3.3 Junction Capacitance 

Consider the following silicon junction diode: Cj 0 = 2 X 10' J F/m 2 , A D = 0.5 pm 2 , and 0 O = 
0.64 V. A reverse bias of -2.5 V results in a junction capacitance of 0.9 X 10' 3 F/m 2 (0.9 
fF/pm 21 , or, for the total diode, a capacitance of 0.45 fF. 



Equation (3.7) is only valid under the condition that the pn-junction is an abrupt 
junction , where the transition from n to p material is instantaneous. This is often not the 
case in actual integrated-circuit pu-junctions, where the transition from n to p material can 
be gradual. In those cases, a linear distribution of the impurities across the junction is a 
better approximation than the step function of the abrupt junction. An analysis of the 
linearly- graded junction shows that the junction capacitance equation of Eq. (3.7) still 
holds, but with a variation in order of the denominator. A more generic expression for the 
junction capacitance can be provided. 



A- 
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C, = ^ (3.9) 

1 (1 -V D /^r 

where m is called the grading coefficient and equals 1/2 for the abrupt junction and 1/3 for 
the linear or graded junction. Both cases are illustrated in Figure 3.8. 



Large-Signal Depletion-Region Capacitance 

Figure 3.8 raises awareness to the fact that the junction capacitance is a voltage-dependent 
parameter whose value varies widely between bias points. In digital circuits, operating 
voltages tend to move rapidly over a wide range. Under those circumstances, it is more 
attractive to replace the voltage-dependent, nonlinear capacitance C ; by an equivalent, lin- 
ear capacitance C eq . C eq is defined such that, for a given voltage swing from voltages V high 
to V low , the same amount of charge is transferred as would be predicted by the nonlinear 
model 



Ceq AVr 



A <2, = Q£V high )-Qj{V low ) 



^ high ~ ^ low 



~ K eq Cj 0 



(3.10) 



Combining Eq. (3.4) (extended to accommodate the grading coefficient m) and Eq. 
(3.10) yields the value of K eq . 



K eq = 



-n 



(V„i g H-V l0W )(l-m) 



Mo 



. V J 1 - m . 
y high) 



Mo 






(3.11) 



Example 3.4 Average Junction Capacitance 

The diode of Example 3.3 is switched between 0 and -2.5 V. Compute the average junction 
capacitance ( m = 0.5). 

For the defined voltage range and for 0 () = 0.64 V, K eq evaluates to 0.622. The average 
capacitance hence equals 1.24 fF/pm 2 . 




3.2.4 The Actual Diode — Secondary Effects 

In practice, the diode current is less than what is predicted by the ideal diode equation. Not 
all applied bias voltage appears directly across the junction, as there is always some volt- 
age drop over the neutral regions. Fortunately, the resistivity of the neutral zones is gener- 
ally small (between 1 and 100 £2, depending upon the doping levels) and the voltage drop 
only becomes significant for large currents (>1 mA). This effect can be modeled by add- 
ing a resistor in series with the n- and p-region diode contacts. 

In the discussion above, it was further assumed that under sufficient reverse bias, the 
reverse current reaches a constant value, which is essentially zero. When the reverse bias 
exceeds a certain level, called the breakdown voltage, the reverse current shows a dra- 
matic increase as shown in Figure 3.9. In the diodes found in typical CMOS processes, 
this increase is caused by the avalanche breakdown. The increasing reverse bias heightens 
the magnitude of the electrical field across the junction. Consequently, carriers crossing 
the depletion region are accelerated to high velocity. At a critical field E crit , the carriers 
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V D {V) 



reach a high -enough energy level that electron-hole pairs are created on collision with 
immobile silicon atoms. These carriers create, in turn, more carriers before leaving the 
depletion region. The value of E crjt is approximately 2 x 10 5 V/cm for impurity concentra- 
tions of the order of 10 16 cm -3 . While avalanche breakdown in itself is not destructive and 
its effects disappear after the reverse bias is removed, maintaining a diode for a long time 
in avalanche conditions is not recommended as the high current levels and the associated 
heat dissipation might cause permanent damage to the structure. Observe that avalanche 
breakdown is not the only breakdown mechanism encountered in diodes. For highly doped 
diodes, another mechanism, called Zener breakdown, can occur. Discussion of this phe- 
nomenon is beyond the scope of this text. 

Finally, it is worth mentioning that the diode current is affected by the operating 
temperature in a dual way: 

1. The thermal voltage (f> r , which appears in the exponent of the current equation, is 
linearly dependent upon the temperature. An increase in <\> T causes the current to 
drop. 

2. The saturation current I s is also temperature-dependent, as the thermal equilibrium 
carrier concentrations increase with increasing temperature. Theoretically, the satu- 
ration current approximately doubles every 5 °C. Experimentally, the reverse cur- 
rent has been measured to double every 8 °C. 

This dual dependence has a significant impact on the operation of a digital circuit. First of 
all, current levels (and hence power consumption) can increase substantially. For instance, 
for a forward bias of 0.7 V at 300 K, the current increases approximately 6%/°C, and dou- 
bles every 12 °C. Secondly, integrated circuits rely heavily on reverse-biased diodes as 
isolators. Increasing the temperature causes the leakage current to increase and decreases 
the isolation quality. 

3.2.5 The SPICE Diode Model 

In the preceding sections, we have presented a model for manual analysis of a diode cir- 
cuit. For more complex circuits, or when a more accurate modeling of the diode that takes 
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Figure 3.10 SPICE diode model. 

into account second-order effects is required, manual circuit evaluation becomes intracta- 
ble, and computer-aided simulation is necessary. While different circuit simulators have 
been developed over the last decades, the SPICE program, developed at the University of 
California at Berkeley, is definitely the most successful [Nagel75]. Simulating an inte- 
grated circuit containing active devices requires a mathematical model for those devices 
(which is called the SPICE model in the rest of the text). The accuracy of the simulation 
depends directly upon the quality of this model. For instance, one cannot expect to see the 
result of a second-order effect in the simulation if this effect is not present in the device 
model. Creating accurate and computation-efficient SPICE models has been a long pro- 
cess and is by no means finished. Every major semiconductor company has developed 
their own proprietary models, which it claims have either better accuracy or computational 
efficiency and robustness. 

The standard SPICE model for a diode is simple, as shown in Figure 3.10. The 
steady-state characteristic of the diode is modeled by the nonlinear current source 
which is a modified version of the ideal diode equation 

I D = I s (e v ° /n ^-\) (3.12) 

The extra parameter n is called the emission coefficient. It equals 1 for most com- 
mon diodes but can be somewhat higher than 1 for others. The resistor R s models the 
series resistance contributed by the neutral regions on both sides of the junction. For 
higher current levels, this resistance causes the internal diode V D to differ from the exter- 
nally applied voltage, hence causing the current to be lower than what would be expected 
from the ideal diode equation. 

The dynamic behavior of the diode is modeled by the nonlinear capacitance C D , 
which combines the two different charge-storage effects in the diode: the space (or deple- 
tion-region) charge, and the excess minority carrier charge. Only the former was discussed 
in this chapter, as the latter is only an issue under forward-biasing conditions. 







C /» + Vs^„/,4 r 

( i — v D / <f> 0 )" ! tj) j. 



(3.13) 



A listing of the parameters used in the diode model is given in Table 3.1. Besides the 
parameter name, symbol, and SPICE name, the table contains also the default value used 
by SPICE in case the parameter is left undefined. Observe that this table is by no means 
complete. Other parameters are available to govern second-order effects such as break- 
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down, high-level injection, and noise. To be concise, we chose to limit the listing to the 
parameters of direct interest to this text. For a complete description of the device models 
(as well as the usage of SPICE), we refer to the numerous textbooks devoted to SPICE 
(e.g., [Banhzaf92], [Thorpe92]). 



Table 3.1 First-order SPICE diode model parameters. 



Parameter Name 


Symbol 


SPICE Name 


Units 


Default Value 


Saturation current 


Is 


IS 


A 


1.0 E— 14 


Emission coefficient 


n 


N 


- 


1 


Series resistance 


R s 


RS 


n 


0 


Transit time 


z T 


TT 


S 


0 


Zero-bias junction 
capacitance 


C J0 


CJ0 


F 


0 


Grading coefficient 


m 


M 


- 


0.5 


Junction potential 


<t>0 


VJ 


V 


1 



3.3 The MOS(FET) Transistor 

The metal-oxide-semiconductor field-effect transistor (MOSFET or MOS, for short) is 
certainly the workhorse of contemporary digital design. Its major asset from a digital per- 
spective is that the device performs very well as a switch, and introduces little parasitic 
effects. Other important advantages are its integration density combined with a relatively 
“simple” manufacturing process, which make it possible to produce large and complex 
circuits in an economical way. 

Following the approach we took for the diode, we restrict ourselves in this section to 
a general overview of the transistor and its parameters. After a generic overview of the 
device, we present an analytical description of the transistor from a static (steady-state) 
and dynamic (transient) viewpoint. The discussion concludes with an enumeration of 
some second-order effects and the introduction of the SPICE MOS transistor models. 

3.3.1 A First Glance at the Device 

The MOSFET is a four terminal device. The voltage applied to the gate terminal deter- 
mines if and how much current flows between the source and the drain ports. The body 
represents the fourth terminal of the transistor. Its function is secondary as it only serves to 
modulate the device characteristics and parameters. 

At the most superficial level, the transistor can be considered to be a switch. When a 
voltage is applied to the gate that is larger than a given value called the threshold voltage 
V T , a conducting channel is formed between drain and source. In the presence of a voltage 
difference between the latter two, current flows between them. The conductivity of the 
channel is modulated by the gate voltage — the larger the voltage difference between gate 
and source, the smaller the resistance of the conducting channel and the larger the current. 






* 



chapter3.fm Page 91 Friday, January 18, 2002 9:00 AM 






Section 3.3 The MOS(FET) Transistor 



91 



When the gate voltage is lower than the threshold, no such channel exists, and the switch 
is considered open. 

Two types of MOSFET devices can be identified. The NMOS transistor consists of 
n + drain and source regions, embedded in a p-type substrate. The current is carried by 
electrons moving through an n-type channel between source and drain. This is in contrast 
with the pn-junction diode, where current is carried by both holes and electrons. MOS 
devices can also be made by using an u-type substrate and p + drain and source regions. In 
such a transistor, current is carried by holes moving through a p-type channel. The device 
is called a p-channel MOS, or PMOS transistor. In a complementary MOS technology 
(CMOS), both devices are present. The cross-section of a contemporary dual-well CMOS 
process was presented in Chapter 2, and is repeated here for convenience (Figure 3.11). 



gate-oxide 





& 



Figure 3.11 Cross-section of contemporary dual- well CMOS process. 



Circuit symbols for the various MOS transistors are shown in Figure 3.12. As men- 
tioned earlier, the transistor is a four-port device with gate, source, drain, and body termi- 
nals (Figures a and c). Since the body is generally connected to a dc supply that is identical 
for all devices of the same type (GND for NMOS, V dd for PMOS), it is most often not 
shown on the schematics (Figures b and d). If the fourth terminal is not shown, it is 
assumed that the body is connected to the appropriate supply. 



_ 1 _ 
s J TL. 



G 

l_ 



(a) NMOS transistor (b) NMOS transistor 

as 4-terminal device as 3-terminal device 



s TTTL 



(a) PMOS transistor 
as 4-terminal device 



G 

s J L„ 

(d) PMOS transistor 
as 3 -terminal device 



Figure 3.12 Circuit symbols for 
MOS transistors. 
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3.3.2 The MOS Transistor under Static Conditions 

In the derivation of the static model of the MOS transistor, we concentrate on the NMOS 
device. All the arguments made are valid for PMOS devices as well as will be discussed at 
the end of the section. 

The Threshold Voltage 

Consider first the case where V cs = 0 and drain, source, and bulk are connected to ground. 
The drain and source are connected by back-to-back pn-junctions (substrate-source and 
substrate -drain). Under the mentioned conditions, both junctions have a 0 V bias and can 
be considered off, which results in an extremely high resistance between drain and source. 

Assume now that a positive voltage is applied to the gate (with respect to the 
source), as shown in Figure 3.13. The gate and substrate form the plates of a capacitor 
with the gate oxide as the dielectric. The positive gate voltage causes positive charge to 
accumulate on the gate electrode and negative charge on the substrate side. The latter 
manifests itself initially by repelling mobile holes. Hence, a depletion region is formed 
below the gate. This depletion region is similar to the one occurring in a pn-junction 
diode. Consequently, similar expressions hold for the width and the space charge per unit 
area. Compare these expressions to Eq. (3.4) and Eq. (3.5). 




with N a the substrate doping and (j) the voltage across the depletion layer (i.e., the potential 
at the oxide-silicon boundary). 

As the gate voltage increases, the potential at the silicon surface at some point 
reaches a critical value, where the semiconductor surface inverts to n-type material. This 




Figure 3.13 NMOS transistor for positive Vgs’ showing depletion region and induced channel. 
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point marks the onset of a phenomenon known as strong inversion and occurs at a voltage 
equal to twice the Fermi Potential (Eq. (3.16)) (§ F ~ -0.3 V for typical p-type silicon sub- 
strates): 



4>f = -<Mn(— ) (3.16) 

n i 

Further increases in the gate voltage produce no further changes in the depletion- 
layer width, but result in additional electrons in the thin inversion layer directly under the 
oxide. These are drawn into the inversion layer from the heavily doped n+ source region. 
Hence, a continuous u-type channel is formed between the source and drain regions, the 
conductivity of which is modulated by the gate-source voltage. 

In the presence of an inversion layer, the charge stored in the depletion region is 
fixed and equals 




Qbo = j2qN A E sj \-2§ F \ (3.17) 

This picture changes somewhat in case a substrate bias voltage V SB is applied (V SB is nor- 
mally positive for n-channel devices). This causes the surface potential required for strong 
inversion to increase and to become \-2§ F + V SB \. The charge stored in the depletion 
region now is expressed by Eq. (3.18) 

Q b = j2qN A e si (\-2$ F +V SB \) (3.18) 

The value of V cs where strong inversion occurs is called the threshold voltage V T . 
V T is a function of several components, most of which are material constants such as the 
difference in work-function between gate and substrate material, the oxide thickness, the 
Fermi voltage, the charge of impurities trapped at the surface between channel and gate 
oxide, and the dosage of ions implanted for threshold adjustment. From the above argu- 
ments, it has become clear that the source-bulk voltage V SB has an impact on the threshold, 
as well. Rather than relying on a complex (and hardly accurate) analytical expression for 
the threshold, we rely on an empirical parameter called V m , which is the threshold voltage 
for V SB = 0, and is mostly a function of the manufacturing process. The threshold voltage 
under different body-biasing conditions can then be determined in the following manner, 

V T = V T0 + y( 2§ f + — ^/| 2 <t>/r| ) (3.19) 

The parameter y (gamma) is called the body-effect coefficient, and expresses the impact of 
changes in V SB . Observe that the threshold voltage has a positive value for a typical 
NMOS device, while it is negative for a normal PMOS transistor. 
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Figure 3.14 Effect of body-bias on threshold. 



The effect of the well bias on the 
threshold voltage of an NMOS 
transistor is plotted in for typical 
values of l-2<j)pl = 0.6 V and y = 
0.4 V 0 ' 5 . A negative bias on the 
well or substrate causes the 
threshold to increase from 0.45 V 
to 0.85 V. Note also that V SB 
always has to be larger than -0.6 V 
in an NMOS. If not, the source- 
body diode becomes forward 
biased, which deteriorates the 
transistor operation. 




Example 3.5 Threshold Voltage of a PMOS Transistor 

An PMOS transistor has a threshold voltage of -0.4 V, while the body-effect coefficient 
equals -0.4. Compute the threshold voltage for V SB = -2.5 V. 2<j) F = 0.6 V. 

Using Eq. (3.19), we obtain Vj{-2.5 V) = -0.4 - 0.4 x ((2.5+0.6) 0 ' 5 - 0.6° 5 ) V = -0.79 V, 
which is twice the threshold under zero-bias conditions! 



Resistive Operation 

Assume now that V GS > V T and that a small voltage, V DS , is applied between drain and 
source. The voltage difference causes a current I D to flow from drain to source (Figure 
3.15). Using a simple analysis, a first-order expression of the current as a function of V cs 
and V DS can be obtained. 




At a point x along the channel, the voltage is V(x), and the gate-to-channel voltage at 
that point equals V GS - V(x). Under the assumption that this voltage exceeds the threshold 
voltage all along the channel, the induced channel charge per unit area at point x can be 
computed. 






ill 
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Qi(x) = -C 0X [V cs - V(x) - V T ] (3.20) 

C ox stands for the capacitance per unit area presented by the gate oxide, and equals 

C ox = £ f (3.21) 

*OX 

with £ ox = 3.97 x e 0 = 3.5 x 1CT 11 F/m the oxide permittivity, and f ot is the thickness of the 
oxide. The latter which is 10 nm (= 100 A) or smaller for contemporary processes. For an 
oxide thickness of 5 nm, this translates into an oxide capacitance of 7 fF/(j.m 2 . 

The current is given as the product of the drift velocity of the carriers v n and the 
available charge. Due to charge conservation, it is a constant over the length of the chan- 
nel. W is the width of the channel in a direction perpendicular to the current flow. 

I D = -VnWQiWW (3.22) 

The electron velocity is related to the electric field through a parameter called the mobility 
(J.„ (expressed in m 2 /V-s). The mobility is a complex function of crystal structure, and local 
electrical field. In general, an empirical value is used. 

v„ = -n^W = (3.23) 

ax 

Combining Eq. (3.20) - Eq. (3.23) yields 

I D dx = \l„C ox W(V GS - V- V T )dV (3.24) 

Integrating the equation over the length of the channel L yields the voltage-current relation 
of the transistor. 







( Vcs ~ ^ 7 ’) VdS 



( Vgs ~ ) Vds Y' 



k' n is called the process transconductance parameter and equals 



(3.25) 



k' n = \y n C ox = ^ (3.26) 

‘'OX 

The product of the process transconductance k n and the ( W/L) ratio of an (NMOS) tran- 
sistor is called the gain factor k n of the device. For smaller values of V DS , the quadratic 
factor in Eq. (3.25) can be ignored, and we observe a linear dependence between V DS and 
I D . The operation region where Eq. (3.25) holds is hence called the resistive or linear 
region. One of its main properties is that it displays a continuous conductive channel 
between source and drain regions. 



NOTICE: The W and L parameters in Eq. (3.25) represent the effective channel width 
and length of the transistor. These values differ from the dimensions drawn on the layout 
due to effects such as lateral diffusion of the source and drain regions (L), and the 
encroachment of the isolating field oxide (W). In the remainder of the text, W and L will 
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always stand for the effective dimensions, while a d subscript will be used to indicate the 
drawn size. The following expressions related the two parameters, with AIT and A L 
parameters of the manufacturing process: 



W = W d - AW 
L = L d -AL 



(3.27) 




The Saturation Region 

As the value of the drain-source voltage is further increased, the assumption that the chan- 
nel voltage is larger than the threshold all along the channel ceases to hold. This happens 
when V cs - V(x) < V T . At that point, the induced charge is zero, and the conducting chan- 
nel disappears or is pinched off. This is illustrated in Figure 3.16, which shows (in an 




Figure 3.16 NMOS transistor under pinch-off conditions. 



exaggerated fashion) how the channel thickness gradually is reduced from source to drain 
until pinch-off occurs. No channel exists in the vicinity of the drain region. Obviously, for 
this phenomenon to occur, it is essential that the pinch-off condition be met at the drain 
region, or 

V C s-V DS <V T (3.28) 

Under those circumstances, the transistor is in the saturation region, and Eq. (3.25) 
no longer holds. The voltage difference over the induced channel (from the pinch-off point 
to the source) remains fixed at V GS - V T , and consequently, the current remains constant 
(or saturates). Replacing V DS by V GS - V T i n Eq. (3.25) yields the drain current for the sat- 
uration mode. It is worth observing that, to a first agree, the current is no longer a function 
of V DS . Notice also the squared dependency of the drain current with respect to the control 
voltage V cs . 



k'„W 

= 2 f (y -' 



V T ) 2 










(3.29) 





* 



chapter3.fm Page 97 Friday, January 18, 2002 9:00 AM 






Section 3.3 The MOS(FET) Transistor 



97 




Channel-Length Modulation 

The latter equation seems to suggest that the transistor in the saturation mode acts as a per- 
fect current source — or that the current between drain and source terminal is a constant, 
independent of the applied voltage over the terminals. This not entirely correct. The effec- 
tive length of the conductive channel is actually modulated by the applied V DS : increasing 
V DS causes the depletion region at the drain junction to grow, reducing the length of the 
effective channel. As can be observed from Eq. (3.29), the current increases when the length 
factor L is decreased. A more accurate description of the current of the MOS transistor is 
therefore given in Eq. (3.30). 

I D = I D \\+XV DS ) (3.30) 

with I/)' the current expressions derived earlier, and X an empirical parameter, called the 
channel-length modulation. Analytical expressions for X have proven to be complex and 
inaccurate. X varies roughly with the inverse of the channel length. In shorter transistors, 
the drain-junction depletion region presents a larger fraction of the channel, and the chan- 
nel-modulation effect is more pronounced. It is therefore advisable to resort to long-chan- 
nel transistors if a high-impedance current source is needed. 



Velocity Saturation 



The behavior of transistors with very short channel lengths (called short-channel devices) 
deviates considerably from the resistive and saturated models, presented in the previous 
paragraphs. The main culprit for this deficiency is the velocity saturation effect. Eq. (3.23) 
states that the velocity of the carriers is proportional to the electrical field, independent of 
the value of that field. In other words, the carrier mobility is a constant. However, at high 
field strengths, the carriers fail to follow this linear model. In fact, when the electrical field 
along the channel reaches a critical value ^ e , the velocity of the carriers tends to saturate 
due to scattering effects (collisions suffered by the carriers). This is illustrated in Figure 
3.17. 




Figure 3.17 Velocity-saturation effect. 



For p-type silicon, the critical field at which electron saturation occurs is around 1.5 
x 10 6 V/m (or 1.5 V/(im), and the saturation velocity V sat approximately equals 10 5 m/s. 
This means that in an NMOS device with a channel length of 1 p.m, only a couple of volts 
between drain and source are needed to reach the saturation point. This condition is easily 
met in current short-channel devices. Holes in a n-type silicon saturate at the same veloc- 
ity, although a higher electrical field is needed to achieve saturation. Velocity-saturation 
effects are hence less pronounced in PMOS transistors. 
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This effect has a profound impact on the operation of the transistor. We will illus- 
trate this with a first-order derivation of the device characteristics under velocity-saturat- 
ing conditions [Ko89]. The velocity as a function of the electrical field, plotted in Figure 
3.17, can be roughly approximated by the following expression: 



3) 



1 



for 






War for ^ *=c 



(3.31) 



The continuity requirement between the two regions dictates that t, c = 2v sa /\l n . Re-evalua- 
tion of Eq. (3.20) and Eq. (3.22) in light of revised velocity formula leads to a modified 
expression of the drain current in the resistive region: 



I D - 



W 



(Vcs-VrW, 



V 2" 
V DS 



DS ' 



with 



K(V) 



1 

\+{V/^;L ) 



(3.32) 



(3.33) 



K is a measure of the degree of velocity saturation, since V DS /L can be interpreted as the 
average field in the channel. In case of long-channel devices (large values of L) or small 
values of V DS , K approaches 1 and Eq. (3.32) simplifies to the traditional current equation 
for the resistive operation mode. For short-channel devices, K is smaller than 1, which 
means that the delivered current is smaller than what would be normally expected. 

When increasing the drain-source voltage, the electrical field in the channel will 
ultimately reach the critical value, and the carriers at the drain become velocity saturated. 
The saturation drain voltage V DSAT can be calculated by equating the current at the drain to 
the current given by Eq. (3.32) for V DS = V DSAT . The former is derived from Eq. (3.22), 
assuming that the drift velocity is saturated and equals V sar 



IdSAT ~ U sat C ox W(V GT 

V DS A 7’) 



K ^DSAT>\ i -n ( ~'ox~r 



VgtVdsat~ 



Vdsat 2 ~\ 



(3.34) 



V GT is a shorthand notation for V cs - V T . After some algebra, we obtain 

Vdsat = k(V ct )V ct (3.35) 

f ; u rt her increasing the drain-source voltage does not yield more current (to a first degree) 
and the transistor current saturates at I DSAT . This leads to some interesting observations: 

• For a short-channel device and for large enough values of V GT , K( V cl ) is substan- 
tially smaller than 1, hence V DSAT < V GT . The device enters saturation before V DS 
reaches V cs - V T . Short-channel devices therefore experience an extended saturation 
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device 



device 



Figure 3.18 Short-channel devices display an 
extended saturation region due to velocity-saturation. 

region, and tend to operate more often in saturation conditions than their long-chan- 
nel counterparts, as is illustrated in Figure 3.18. 

• The saturation current I DSAT displays a linear dependence with respect to the gate- 
source voltage V cs , which is in contrast with the squared dependence in the long- 
channel device. This reduces the amount of current a transistor can deliver for a 
given control voltage. On the other hand, reducing the operating voltage does not 
have such a significant effect in submicron devices as it would have in a long-chan- 
nel transistor. 

The equations above ignore that a larger portion of the channel becomes velocity-satu- 
rated with a further increase of V DS . From a modeling perspective, it appears as though the 
effective channel is shortening with increasing V DS , similar in effect to the channel-length 
modulation. The resulting increase in current is easily accommodated by introducing an 
extra ( 1 + X X V DS ) multiplier. 

Thus far we have only considered the effects of the tangential field along the chan- 
nel due to the V DS , when considering velocity-saturation effects. However, there also 
exists a normal (vertical) field originating from the gate voltage that further inhibits chan- 
nel carrier mobility. This effect, which is called mobility degradation, reduces the surface 
mobility with respect to the bulk mobility. Eq. (3.36) provides a simple estimation of the 
mobility reduction. 





= Pn 

"' eff 1 +Mv cs -v r ) 



(3.36) 



with \i n0 the bulk mobility and r| an empirical parameter. A typical approach is to use 
derive the actual value of |i for a given field strength from tables or empirical charts. 

Readers interested in a more in-depth perspective on the short-channel effects in 
MOS transistors are referred to the excellent reference works on this topic, such as 
[Ko89], 



Velocity Saturation — Revisited 

Unfortunately, the drain-current equations Eq. (3.32) and Eq. (3.33) are complex expres- 
sions of V GS and V DS , which makes them rather unwieldy for a first-order manual analysis. 
A substantially simpler model can be obtained by making two assumptions: 
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1. The velocity saturates abruptly at £ c , and is approximated by the following expres- 
sion: 



V = p„q for % < % c 

= V sat = \l,£c for ZHc 



(3.37) 



2. The drain-source voltage V DSAT at which the critical electrical field is reached and 
velocity saturation comes into play is constant and is approximated by Eq. (3.38). 
From Eq. (3.35), it can be observed that this assumption is reasonable for larger val- 
ues of V GT (» q r L). 




Vdsat = *4 = (3-38) 

r-n 

Under these circumstances, the current equations for the resistive region remain 
unchanged from the long-channel model. Once V DSAT is reached, the current abruptly satu- 
rates. The value for I DSAT at that point can be derived by plugging the saturation voltage 
into the current equation for the resistive region (Eq. (3.25)). 

Idsat = V ds = Vdsat ) 

= » n C ox j (( V GS - V T ) V DSAT - ( 3. 39) 
= ^ S a,C„ x w(v GS -V T - V -^] 

This model is truly first-order and empirical. The simplified velocity model causes 
substantial deviations in the transition zone between linear and velocity-saturated regions. 
Yet, by carefully choosing the model parameters, decent matching can be obtained with 
empirical data in the other operation regions, as will be shown in one of the following sec- 
tions. Most importantly, the equations are coherent with the familiar long-channel equa- 
tions, and provide the digital designer with a much needed tool for intuitive understanding 
and interpretation. 

Drain Current versus Voltage Charts 

The behavior for the MOS transistor in the different operation regions is best understood 
by analyzing its I D -V DS curves, which plot I D versus V DS with V GS as a parameter. Figure 
3.19 shows these charts for two NMOS transistors, implemented in the same technology 
and with the same W/L ratio. One would hence expect both devices to display identical I- 
V characteristics. The main difference however is that the first device has a long channel 
length (L d = 10 pm), while the second transistor is a short channel device (L d = 0.25 pm), 
and experiences velocity saturation. 

Consider first the long-channel device. In the resistive region, the transistor behaves 
like a voltage-controlled resistor, while in the saturation region, it acts as a voltage-con- 
trolled current source (when the channel-length modulation effect is ignored). The transi- 





4 ^ 
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(a) Long-channel transistor (L d = 10 (I rn) (b) Short-channel transistor (L d = 0.25 jlm) 



Figure 3.19 I- V characteristics of long- and a short-channel NMOS transistors in a 0.25 (im CMOS technology. The ( W/L ) 
ration of both transistors is identical and equals 1.5 

tion between both regions is delineated by the V DS = V cs - V T curve. The squared 
dependence of I D as a function of V GS i n the saturation region — typical for a long channel 
device — is clearly observable from the spacing between the different curves. The linear 
dependence of the saturation current with respect to VGS is apparent in the short-channel 
device of b. Notice also how velocity-saturation causes the device to saturate for substan- 
tially smaller values of V DS . This results in a substantial drop in current drive for high volt- 
age levels. For instance, at ( V cs = 2.5 V, V DS = 2.5 V), the drain current of the short 
transistor is only 40% of the corresponding value of the longer device (220 (J.A versus 540 
ft A). 




V «M ^(V) 

(a) Long-channel device ( L d =10 }im) (b) Short-channel device ( L d - 0.25 jxm) 



Figure 3.20 NMOS transistor gs characteristic for long and short-channel devices (0.25 (im CMOS 
technology). W/L =1.5 for both transistors and V DS = 2.5 V. 
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The difference in dependence upon V GS between long- and short-channel devices is 
even more pronounced in another set of simulated charts that plot I D as a function of V cs 
for a fixed value of V DS (> V cs — hence ensuring saturation) (Figure 3.20). A quadratic 
versus linear dependence is apparent for larger values of V GS . 

All the derived equations hold for the PMOS transistor as well. The only difference 
is that for PMOS devices, the polarities of all voltages and currents are reversed. This 
is illustrated in Figure 3.21, which plots the I D -V DS characteristics of a minimum-size 
PMOS transistor in our generic 0.25 (tm CMOS process. The curves are in the third quad- 
rant as I D , V DS , and V GS are all negative. Interesting to observe is also that the effects of 
velocity saturation are less pronounced than in the CMOS devices. This can be attributed 
to the higher value of the critical electrical field, resulting from the smaller mobility of 
holes versus electrons. 




V DS< V > 



Figure 3.21 I-V characteristics of (1^=0.375 flm, 
L d - 0.25 (im) PMOS transistor in 0.25 |tm CMOS 
process. Due to the smaller mobility, the maximum 
current is only 42% of what is achieved by a similar 
NMOS transistor. 



Subthreshold Conduction 

A closer inspection of the I D -V GS curves of Figure 3.20 reveals that the current does not 
drop abruptly to 0 at V GS = V T . It becomes apparent that the MOS transistor is already par- 
tially conducting for voltages below the threshold voltage. This effect is called subthresh- 
old or weak-inversion conduction. The onset of strong inversion means that ample carriers 
are available for conduction, but by no means implies that no current at all can flow for 
gate-source voltages below V T , although the current levels are small under those condi- 
tions. The transition from the on- to the off-condition is thus not abrupt, but gradual. 

To study this effect in somewhat more detail, we redraw the I D versus V GS curve of 
Figure 3.20b on a logarithmic scale as shown in Figure 3.22. This confirms that the current 
does not drop to zero immediately for V GS < V T , but actually decays in an exponential 
fashion, similar to the operation of a bipolar transistor. 2 In the absence of a conducting 
channel, the n + (source) - p (bulk) - n + (drain) terminals actually form a parasitic bipolar 
transistor. The current in this region can be approximated by the expression 

2 Discussion of the operation of bipolar transistors is out of the scope of this textbook. We refer to vari- 
ous textbooks on semiconductor devices, or to the additional information that is available on the web-site of this 
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v qsM 



Figure 3.22 I D current versus Vgs 
(on logarithmic scale), showing the 
exponential characteristic of the 
subthreshold region. 



V r , 



( 



I s e 



nkT/ q 



1 - e 



v PS \ 

~kT/q 



(3.40) 



where I s and n are empirical parameters, with n > 1 and typically ranging around 1.5. 

In most digital applications, the presence of subthreshold current is undesirable as it 
detracts from the ideal switch-like behavior that we like to assume for the MOS transistor. 
We would rather have the current drop as fast as possible once the gate-source voltage 
falls below V T . The (inverse) rate of decline of the current with respect to V GS below V T 
hence is a quality measure of a device. It is often quantified by the slope factor S, which 
measures by how much V GS has to be reduced for the drain current to drop by a factor of 
10. From Eq. (3.40), we find 





with S is expressed in mV/decade. For an ideal transistor with the sharpest possible roll- 
off, n = 1 and (kT/q)\n( 1 0) evaluates to 60 mV/decade at room temperature, which means 
that the subthreshold current drops by a factor of 10 for a reduction in V GS of 60 mV. 
Unfortunately, n is larger than 1 for actual devices and the current falls at a reduced rate 
(90 mV /decade for n = 1.5). The current roll-off is further affected in a negative sense by 
an increase in the operating temperature (most integrated circuits operate at temperatures 
considerably beyond room temperature). The value of n is determined by the intrinsic 
device topology and structure. Reducing its value hence requires a different process tech- 
nology, such as silicon-on-insulator. 

Subthreshold current has some important repercussions. In general, we want the cur- 
rent through the transistor to be as close as possible to zero at V GS = 0. This is especially 
important in the so-called dynamic circuits, which rely on the storage of charge on a 
capacitor and whose operation can be severely degraded by subthreshold leakage. Achiev- 
ing this in the presence of subthreshold current requires a firm lower bound on the value of 
the threshold voltage of the devices. 
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Example 3.6 Subthreshold Slope 

For the example of Figure 3.22, a slope of 89.5 mV/decade is observed (between 0.2 and 0.4 
V). This is equivalent to an /r-factor of 1.49. 



In Summary - Models for Manual Analysis 

The preceding discussions made it clear that the deep-submicron transistor is a complex 
device. Its behavior is heavily non-linear and is influenced by a large number of second- 
order effects. Fortunately, accurate circuit-simulation models have been developed that 
make it possible to predict the behavior of a device with amazing precision over a large 
range of device sizes, shapes, and operation modes, as we will discuss later in this chapter. 
While excellent from an accuracy perspective, these models fail in providing a designer 
with an intuitive insight in the behavior of a circuit and its dominant design parameters. 
Such an understanding is necessary in the design analysis and optimization process. A 
designer who misses a clear vision on what drives and governs the circuit operation by 
necessity resorts on a lengthy trial by error optimization process, that most often leads to 
an inferior solution. 

The obvious question is now how to abstract the behavior of our MOS transistor into 
a simple and tangible analytical model that does not lead to hopelessly complex equations, 
yet captures the essentials of the device. It turns out that the first-order expressions, 
derived earlier in the chapter, can be combined into a single expression that meets these 
goals. The model presents the transistor as a single current source (Figure 3.23), the value 
of which is given defined in the Figure. The reader can verify that, depending upon the 
operating condition, the model simplifies into either Eq. (3.25), Eq. (3.29), or Eq. (3.39) 
(corrected for channel-length modulation), depending upon operating conditions. 









I D = 0 for V CT < 0 

2 




o G 




b = k'^(v GT V min - V -fy + XV DS ) for V GT >0 


5 o 


— w — 


_o D 


with V min = min(V cr , V DS , V DSAT ), 








B 

0 




V GT = V GS - V T , 


and V T — V T q + y( 2(|) F +V 5fi | 2(^|) 



Figure 3.23 A unified MOS model for manual analysis. 



Besides being a function of the voltages at the four terminals of the transistor, the 
model employs a set of five parameters: V T0 , y, V DSAT , k\ and X. In principle, it would be 
possible to determine these parameters from the process technology and from the device 
physics equations. The complexity of the device makes this a precarious task. A more 
rewarding approach is to choose the values such that a good matching with the actual 
device characteristics is obtained. More significantly, the model should match the best in 
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the regions that matter the most. In digital circuits, this in the region of high V cs and V DS . 
The performance of an MOS digital circuit is primarily determined by the maximum 
available current (i.e., the current obtained for V cs = V DS = supply voltage). A good 
matching in this region is therefore essential. 



Example 3.7 Manual Analysis Model for 0.25 pm CMOS Process 3 



Based on the simulated I D -V DS and I D -V CS plots of a (W d = 0.375 pm, L d = 0.25 pm) transistor, 
implemented in our generic 0.25 micron CMOS process (Figure 3.19, Figure 3.20), we have 
derived a set of device parameters to match well in the (V DS = 2.5 V, V GS = 2.5 V) region — 
2.5 V being the typical supply voltage for this process. The resulting characteristics are plot- 
ted in Figure 3.24 for the NMOS transistor, and compared to the simulated values. Overall, a 




Figure 3.24 Correspondence between simple model 
(solid line) and SPICE simulation (dotted) for minimum- 
size NMOS transistor (lkpO.375 pm, Lj= 0.25 pm). 
Observe the discrepancy in the transition zone between 
resistive and velocity saturation. 





good correspondence can be observed with the exception of the transition region between 
resistive and velocity-saturation. This discrepancy, which is due the simple velocity model of 
Eq. (3.37) as explained earlier, is acceptable as it occurs in the lower value-range of V DS . It 
demonstrates that our model, while simple, manages to give a fair indication of the overall 
behavior of the device. 



Design Data — Transistor Model for Manual Analysis 



Table 3.2 tabulates the obtained parameter values for the minimum-sized NMOS and a simi- 
larly sized PMOS device in our generic 0.25 pm CMOS process. These values will be used as 
generic model-parameters in later chapters. 



Table 3.2 Parameters for manual model of generic 0.25 pm CMOS process (minimum length device). 





V n (V) 


y(V 0 - 5 ) 


Vdsat IT) 


k’ (A/V 2 ) 


X (V- 1 ) 


NMOS 


0.43 


0.4 


0.63 


1 15 x 10 -6 


0.06 


PMOS 


-0.4 


-0.4 


-1 


-30 x 10 -6 


- 0.1 



3 A MATLAB implementation of the model is available on the web-site of the textbook. 
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A word of caution — The model presented here is derived from the characteristics 
of a single device with a minimum channel-length and width. Trying to extrapolate this 
behavior to devices with substantially different values of W and L will probably lead to 
sizable errors. Fortunately, digital circuits typically use only minimum-length devices as 
these lead to the smallest implementation area. Matching for these transistors will typi- 
cally be acceptable. It is however advisable to use a different set of model parameters for 
devices with dramatically different size- and shape-factors. 



The presented current-source model will prove to be very useful in the analysis of 
the basic properties and metrics of a simple digital gate, yet its non-linearity makes it 
intractable for anything that is somewhat more complex. We therefore introduce an even 
more simplified model that has the advantage of being linear and straightforward. It is 
based on the underlying assumption in most digital designs that the transistor is nothing 
more than a switch with an infinite off-resistance, and a finite on-resistance R on . 










Figure 3.25 NMOS transistor modeled as a switch. 



The main problem with this model is that R on is still time-variant, non-linear and 
depending upon the operation point of the transistor. When studying digital circuits in the 
transient mode — which means while switching between different logic states — it is 
attractive to assume R on as a constant and linear resistance R eq , chosen so that the final 
result is similar to what would be obtained with the original transistor. A reasonable 
approach in that respect is to use the average value of the resistance over the operation 
region of interest, or even simpler, the average value of the resistances at the end-points of 
the transition. The latter assumption works well if the resistance does not experience any 
strong non-linearities over the range of the averaging interval. 



R eq = average = h JR on (t)) 




h 







dt 



(3.42) 



»\(Ron(h)+Ro«W 




Example 3.8 Equivalent resistance when (dis)charging a capacitor 

One of the most common scenario's in contemporary digital circuits is the discharging of a 
capacitor from V DD to GND through an NMOS transistor with its gate voltage set to V DD , or 
vice-versa the charging of the capacitor to V DD through a PMOS with its gate at GND. Of spe- 
cial interest is the point where the voltage on the capacitor reaches the mid-point (V DD /2) — 
this is by virtue of the definition of the propagation delay as introduced in Chapter 2. Assum- 
ing that the supply voltage is substantially larger than the velocity-saturation voltage V DSAT of 
the transistor, it is fair to state that the transistor stays in velocity saturation for the entire 
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duration of the transition. This scenario is plotted in for the case of an NMOS discharging a 
capacitor from V DD to V DD I2. 



V DS (V D d^V dd /2) 



V, 



DD 



T 



(a) schematic 




(b) trajectory traversed on ID- YDS curve. 



Figure 3.26 Discharging a capacitor through an NMOS transistor: Schematic (a) and I-V trajectory (b). The 
instantaneous resistance of the transistor equals (V DS /I D ) and is visualized by the angle with respect to the y-axis. 

With the aid of Eq. (3.42) and Eq. , we can derive the value of the equivalent resis- 
tance, which averages the resistance of the device over the interval. 






= 1 r y_ 

~V nn / 2 J Ir) C4 7.( 1 "t 



dV: 



3 



(1 + XV) 4 1 



-(i ~l^y DD ) 



(3.43) 



with I, 



,W( 

DSAT = * —\(VdD-Vt)VdSAT ^ J 



Vn 



A similar result can be obtained by just averaging the values of the resistance at the end points 
(and simplifying the result using a Taylor expansion): 



n = 1 

^ 2 V/n 






V DD / 2 



r ( 1 + XV DD ) 

1 dsaA 1+XV dd /2) 



' 41 n *^4 A 



-xv n 



(3.44) 



A number of conclusions are worth drawing from the above expressions: 



The resistance is inversely proportional to the ( W/L ) ratio of the device. Doubling the tran- 
sistor width halves the resistance. 

For V DD » Vj + V dsai !2 , the resistance becomes virtually independent of the supply volt- 
age. This is confirmed in halves, which plots the simulated equivalent resistance as a func- 
tion of the supply voltage V DD . Only a minor improvement in resistance, attributable to the 
channel-length modulation, can be observed when raising the supply voltage. 

Once the supply voltage approaches V T , a dramatic increase in resistance can be observed. 
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V DD< V > 



Figure 3.27 Simulated equivalent 
resistance of a minimum size NMOS 
transistor in 0.25 |im CMOS process as a 
function of V DD 

(V gs = V dd ,V ds =V dd ^V dd /2). 



Design Data — Equivalent Resistance Model 




Table 3.3 enumerates the equivalent resistances obtained by simulation of our generic 0.25 pin 
CMOS process. These values will come in handy when analyzing the performance of CMOS 
gates in later chapters. 

Table 3.3 Equivalent resistance R eq (W/L= 1) of NMOS and PMOS transistors in 0.25 )lm CMOS process (with 
L = L mjn ). For larger devices, divide R by WIL. 



V DD (V) 


1 


1.5 


2 


2.5 


NMOS (k £1) 


35 


19 


15 


13 


PMOS (kO) 


115 


55 


38 


31 



3.3.3 Dynamic Behavior 

The dynamic response of a MOSFET transistor is a sole function of the time it takes to 
(dis)charge the parasitic capacitances that are intrinsic to the device, and the extra capaci- 
tance introduced by the interconnecting lines (and are the subject of Chapter 4). A pro- 
found understanding of the nature and the behavior of these intrinsic capacitances is 
essential for the designer of high-quality digital integrated circuits. They originate from 
three sources: the basic MOS structure, the channel charge, and the depletion regions of 
the reverse-biased pn-junctions of drain and source. Aside from the MOS structure capac- 
itances, all capacitors are nonlinear and vary with the applied voltage, which makes their 
analysis hard. We discuss each of the components in turn. 
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MOS Structure Capacitances 

The gate of the MOS transistor is isolated from the conducting channel by the gate oxide 
that has a capacitance per unit area equal to C ox = £ ox / 1 0X . We learned earlier that from a I- 
V perspective it is useful to have C ox as large as possible, or to keep the oxide thickness 
very thin. The total value of this capacitance is called the gate capacitance C g and can be 
decomposed into two elements, each with a different behavior. Obviously, one part of C g 
contributes to the channel charge, and is discussed in a subsequent section. Another part is 
solely due to the topological structure of the transistor. This component is the subject of 
the remainder of this section. 

Consider the transistor structure of Figure 3.28. Ideally, the source and drain diffu- 
sion should end right at the edge of the gate oxide. In reality, both source and drain tend to 
extend somewhat below the oxide by an amount x d , called the lateral diffusion. Hence, the 
effective channel of the transistor L becomes shorter than the drawn length L d (or the 
length the transistor was originally designed for) by a factor of AL = 2x d . It also gives rise 
to a parasitic capacitance between gate and source (drain) that is called the overlap capac- 
itance. This capacitance is strictly linear and has a fixed value 



Polysilicon gate 




(b) Cross section Figure 3.28 MOSFET overlap capacitance. 



Cgso - C CD o - C ox x d W - C a W (3.45) 

Since x d is a technology-determined parameter, it is customary to combine it with the 
oxide capacitance to yield the overlap capacitance per unit transistor width C Q (more spe- 
cifically, C gso and C gdo ). 

Channel Capacitance 

Perhaps the most significant MOS parasitic circuit element, the gate-to-channel capaci- 
tance C cc varies in both magnitude and in its division into three components C GCS , C GCD , 
and C GCB (being the gate-to-source, gate-to-drain, and gate-to-body capacitances, respec- 
tively), depending upon the operation region and terminal voltages. This varying distribu- 
tion is best explained with the simple diagrams of Figure 3.29. When the transistor is in 
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cut-off (a), no channel exists, and the total capacitance C GC appears between gate and 
body. In the resistive region (b), an inversion layer is formed, which acts as a conductor 
between source and drain. Consequently, C GCB = 0 as the body electrode is shielded from 
the gate by the channel. Symmetry dictates that the capacitance distributes evenly between 
source and drain. Finally, in the saturation mode (c), the channel is pinched off. The 
capacitance between gate and drain is approximately zero, and so is the gate -body capaci- 
tance. All the capacitance hence is between gate and source. 



G G G 




(a) cut-off (b) resistive (c) saturation 



Figure 3.29 The gate-to-channel capacitance and how the operation region influences is distribution over the three other 
device terminals. 



To actual value of the total gate-channel capacitance and its distribution over the 
three components is best understood with the aid of a number of charts. The first plot (Fig- 
ure 3.30a) captures the evolution of the capacitance as a function of V cs for V DS = 0. For 
V cs = 0, the transistor is off, no channel is present and the total capacitance, equal to 
WLC ox , appears between gate and body. When increasing V cs , a depletion region forms 
under the gate. This seemingly causes the thickness of the gate dielectric to increase, 
which means a reduction in capacitance. Once the transistor turns on ( V GS = V T ), a channel 
is formed and C GCB drops to 0. With V DS = 0, the device operates in the resistive mode and 
the capacitance divides equally between source and drain, or C GCS = C GCD = WLC a J2. The 
large fluctuation of the channel capacitance around V GS =V T is worth remembering. A 
designer looking for a well-behaved linear capacitance should avoid operation in this 
region. 




Figure 3.30 Distribution of the gate-channel capacitance as a function of V GS and V DS (from [Dally98]). 

Once the transistor is on, the distribution of its gate capacitance depends upon the 
degree of saturation, measured by the V DS /(V GS -V T ) ratio. As illustrated in Figure 3.30b, 
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C GCD gradually drops to 0 for increasing levels of saturation, while C GCS increases to 2/3 
C ox WL. This also means that the total gate capacitance is getting smaller with an increased 
level of saturation. 

From the above, it becomes clear that the gate-capacitance components are nonlin- 
ear and varying with the operating voltages. To make a first-order analysis possible, we 
will use a simplified model with a constant capacitance value in each region of operation 
in the remainder of the text. The assumed values are summarized in Table 3.4. 



Table 3.4 Average distribution of channel capacitance of MOS transistor for different operation regions. 



Operation Region 


C G cb 


(-CCS 


(-GCD 


(<;c 


C G 


Cutoff 


C ox WL 


0 


0 


C ox WL 


C ox WL+2C a W 


Resistive 


0 


C„WU 2 


C ox WLI 2 


C ox WL 


C 0X WL + 2C 0 W 


Saturation 


0 


(2/3 )C ox WL 


0 


( 23)C ox WL 


(2/3)C 0x WL+2C 0 W 





Example 3.9 Using a circuit simulator to extract capacitance 

Determining the value of the parasitic capacitances of an MOS transistor for a given operation 
mode is a labor-intensive task, and requires the knowledge of a number of technology param- 
eters that are often not explicitly available. Fortunately, once a SPICE model of the transistor 
is attained, a simple simulation can give you the data you are interested in. Assume we would 
like to know the value of the total gate capacitance of a transistor in a given technology as a 
function of V GS (for V DS = 0). A simulation of the circuit of Figure 3.31a will give us exactly 
this information. In fact, the following relation is valid: 



7 = C o(Vcs^ GS 



which can be rewritten to yield an expression for C c . 



Cg(V G s) - 1/ 



dr J 



A transient simulation gives us V cs as a function of time, which can be translated into 
the capacitance with the aid of some simple mathematical manipulations. This is demon- 
strated in Figure 3.31b, which plots the simulated gate capacitance of a minimum size 0.25 
pm NMOS transistor as a function of V GS . The graphs clearly shows the drop of the capaci- 
tance when V GS approaches V T and the discontinuity at V T , predicted in Figure 3.30. 




Junction Capacitances 

A final capacitive component is contributed by the reverse-biased source-body and drain- 
body pn-junctions. The depletion-region capacitance is nonlinear and decreases when the 
reverse bias is raised as discussed earlier. To understand the components of the junction 
capacitance (often called the diffusion capacitance ), we must look at the source (drain) 
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Figure 3.31 Simulating the gate capacitance of an MOS 
transistor; (a) circuit configuration used for the analysis, (b) 
resulting capacitance plot for minimum-size NMOS transistor 
in 0.25 |im technology. 




(b) 



region and its surroundings. The detailed picture, shown in Figure 3.32, shows that the 
junction consists of two components: 



Channel-stop implant 





• The bottom-plate junction, which is formed by the source region (with doping N D ) 
and the substrate with doping N A . The total depletion region capacitance for this 
component equals C hottom = CjWL s , with C ; the junction capacitance per unit area as 
given by Eq. (3.9). As the bottom-plate junction is typically of the abrupt type, the 
grading coefficient m approaches 0.5. 

• The side-wall junction, formed by the source region with doping N D and the p + chan- 
nel-stop implant with doping level N A + . The doping level of the stopper is usually 
larger than that of the substrate, resulting in a larger capacitance per unit area. The 
side -wall junction is typically graded, and its grading coefficient varies from 0.33 to 
0.5. Its capacitance value equals C sw = Cj sw Xj ( W + 2 x L s ). Notice that no side-wall 
capacitance is counted for the fourth side of the source region, as this represents the 
conductive channel. 4 

Since Xj, the junction depth, is a technology parameter, it is normally combined with 
Cj sw into a capacitance per unit perimeter C JSW = Cj sw Xj. An expression for the total 
junction capacitance can then be derived. 
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Cdiff = C bottom + C S w = Cj x AREA + Cj sw x PERIMETER 

(3.46) 

= CjL s W + Cj sw (2L s + W) 

Since all these capacitances are small-signal capacitances, we normally linearize them and 
use average capacitances along the lines of Eq. (3.10). 




Problem 3.1 Using a circuit simulator to determine the drain capacitance 

Derive a simple circuit that would help you to derive the drain capacitance of an NMOS 
transistor in the different operation modes using circuit simulation (in the style of Figure 
3.31). 



Capacitive Device Model 

All the above contributions can be combined in a single capacitive model for the MOS 
transistor, which is shown Figure 3.33. Its components are readily identified on the basis 
of the preceding discussions. 

G 




Cos r 

C\n | 



Qjs - C GC s + C GS0 ', C GD - C GCD + C GD0 \ C GB - C GCB 

C-SB = Csdiffi C-DB = C Ddiff ( 3 . 47 ) 

It is essential for the designers of high-performance and low-energy circuits to be very 
familiar with this model as well as to have an intuitive feeling of the relative values of its 
components. 

Example 3.10 MOS Transistor Capacitances 

Consider an NMOS transistor with the following parameters: t ox = 6 nm, L = 0.24 pm, W = 
0.36 pm, L d = L s = 0.625 pm, C a = 3 x 10“ 10 F/m, C j0 = 2 x 10“ 3 F/nr, C jsw0 = 2.75 x 10' 10 
F/m. Determine the zero-bias value of all relevant capacitances. 

The gate capacitance per unit area is easily derived as (£ ox /t ox ) and equals 5.7 tF/pm 2 . 
The gate-to-channel C GC then equals WLC ox = 0.49 fF. To find the total gate capacitance, we 
have to add the source and drain overlap capacitors, each of which equals WC 0 = 0. 105 fF. 
This leads to a total gate capacitance of 0.7 fF. 

4 To be entirely correct, we should take the diffusion capacitance of the source(drain)-to-channel junction 
into account. Due to the doping conditions and the small area, this component can virtually always be ignored in 
a first-order analysis. Detailed SPICE models most often include a factor C JSWG to account for this junction. 






D 






Figure 3.33 MOSFET capacitance model. 
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The diffusion capacitance consists of the bottom and the side-wall capacitances. The 
former is equal to C, 0 L D W = 0.45 fF, while the side-wall capacitance under zero-bias condi- 
tions evaluates to C: sw0 (2 L D + W) = 0.44 fF. This results in a total drain(source)-to-bulk 
capacitance of 0.89 fF. 

The diffusion capacitance seems to dominate the gate capacitance. This is a worst-case 
condition, however. When increasing the value of the reverse bias over the junction — as is 
the normal operation mode in MOS circuits — , the diffusion capacitance is substantially 
reduced. Also, clever design can help to reduce the value of Ld (L s ). In general, it can be 
stated that the contribution of diffusion capacitances is at most equal, and very often substan- 
tially smaller than the gate capacitance. 



Design Data — MOS Transistor Capacitances 




Table 3.5 summarizes the parameters needed to estimate the parasitic capacitances of the MOS 
transistors in our generic 0.25 pm CMOS process. 

Table 3.5 Capacitance parameters of NMOS and PMOS transistors in 0.25 pm CMOS process. 





(fF/gm 2 ) 


c 0 

(fF/gm) 


C , 2 
(fF/gm 2 ) 


m j 


C V) 


r 

JSW 

(fF/p,m) 


"V 


(V) 


NMOS 


6 


0.31 


2 


0.5 


0.9 


0.28 


0.44 


0.9 


PMOS 


6 


0.27 


1.9 


0.48 


0.9 


0.22 


0.32 


0.9 



Source-Drain Resistance 

The performance of a CMOS circuit may further be affected by another set of parasitic 
elements, being the resistances in series with the drain and source regions, as shown in 
Figure 3.34a. This effect become more pronounced when transistors are scaled down, as 
this leads to shallower junctions and smaller contact openings become smaller. The resis- 
tance of the drain ( source) region can be expressed as 

R S ,d = +Rc (3.48) 

with R c the contact resistance, W the width of the transistor, and L s D the length of the 
source or drain region (Figure 3.34b). R Q is the sheet resistance per square of the drain- 
source diffusion, and ranges from 20 to 100 IV □. Observe that the resistance of a square 
of material is constant, independent of its size (see also Chapter 4). 

The series resistance causes a deterioration in the device performance, as it reduces 
the drain current for a given control voltage. Keeping its value as small as possible is thus 
an important design goal for both the device and the circuit engineer. One option, popular 
in most contemporary processes, is to cover the drain and source regions with a low-resis- 
tivity material such as titanium or tungsten. This process is called silicidation and effec- 
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tively reduces the sheet resistance to values in the range from 1 to 4 £2/0. 5 Making the 
transistor wider than needed is another possibility as should be obvious from Eq. (3.48). 
With a process that includes silicidation and proper attention to layout, parasitic resistance 
is not important. However, the reader should be aware that careless layout may lead to 
resistances that severely degrade the device performance. 

3.3.4 The Actual MOS Transistor — Some Secondary Effects 

The operation of a contemporary transistor may show some important deviations from the 
model we have presented so far. These divergences become especially pronounced once 
the dimensions of the transistor reach the deep sub-micron realm. At that point, the 
assumption that the operation of a transistor is adequately described by a one-dimensional 
model, where it is assumed that all current flows on the surface of the silicon and the elec- 
trical fields are oriented along that plane, is not longer valid. Two- or even three-dimen- 
sional models are more appropriate. An example of such was already given in Section 
3.2.2 when we discussed the mobility degradation. 

The understanding of some of these second-order effects and their impact on the 
device behavior is essential in the design of today’s digital circuits and therefore merits 
some discussion. One word of warning, though. Trying to take all those effects into 
account in a manual, first-order analysis results in intractable and opaque circuit models. It 
is therefore advisable to analyze and design MOS circuits first using the ideal model. The 
impact of the non-idealities can be studied in a second round using computer-aided simu- 
lation tools with more precise transistor models. 



G 



V GS,eff T 

Rs 



(a) Modeling the series resistance 
Figure 3.34 Series drain and source resistance. 




(b) Parameters of the series resistance 



5 Silicidation is also used to reduce the resistance of the polysilicon gate, as will be discussed in Chapter 
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(a) Threshold as a function of the (b) Drain-induced barrier 

length (for low ^DS) lowering (for low L) 

Figure 3.35 Threshold variations. 




Threshold Variations 

Eq. (3.19) states that the threshold voltage is only a function of the manufacturing technol- 
ogy and the applied body bias V SB . The threshold can therefore be considered as a constant 
over all NMOS (PMOS) transistors in a design. As the device dimensions are reduced, this 
model becomes inaccurate, and the threshold potential becomes a function of L, W, and 
V DS . Two-dimensional second-order effects that were ignorable for long-channel devices 
suddenly become significant. 

In the traditional derivation of the V TO , for instance, it is assumed that the channel 
depletion region is solely due to the applied gate voltage and that all depletion charge 
beneath the gate originates from the MOS field effects. This ignores the depletion regions 
of the source and reverse-biased drain junction, which become relatively more important 
with shrinking channel lengths. Since a part of the region below the gate is already 
depleted (by the source and drain fields), a smaller threshold voltage suffices to cause 
strong inversion. In other words, V T0 decreases with L for short-channel devices (Figure 
3.35a). A similar effect can be obtained by raising the drain-source (bulk) voltage, as this 
increases the width of the drain-junction depletion region. Consequently, the threshold 
decreases with increasing V DS This effect, called the drain-induced barrier lowering, or 
DIBL, causes the threshold potential to be a function of the operating voltages (Figure 
3.35b). For high enough values of the drain voltage, the source and drain regions can even 
be shorted together, and normal transistor operation ceases to exist. The sharp increase in 
current that results from this effect, which is called punch-through, may cause permanent 
damage to the device and should be avoided. Punch-through hence sets an upper bound on 
the drain-source voltage of the transistor. 

Since the majority of the transistors in a digital circuit are designed at the minimum 
channel length, the variation of the threshold voltage as a function of the length is almost 
uniform over the complete design, and is therefore not much of an issue except for the 
increased sub-threshold leakage currents. More troublesome is the DIBF, as this effect 
varies with the operating voltage. This is, for instance, a problem in dynamic memories, 
where the leakage current of a cell (being the subthreshold current of the access transistor) 
becomes a function of the voltage on the data-line, which depends upon the applied data 
patterns. From the cell perspective, DIBF manifests itself as a data-dependent noise 









source. 
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Worth mentioning is that the threshold of the MOS transistor is also subject to nar- 
row-channel effects. The depletion region of the channel does not stop abruptly at the 
edges of the transistor, but extends somewhat under the isolating field-oxide. The gate 
voltage must support this extra depletion charge to establish a conducting channel. This 
effect is ignorable for wide transistors, but becomes significant for small values of W, 
where it results in an increase of the threshold voltage. For small geometry transistors, 
with small values of L and W, the effects of short- and narrow channels may tend to cancel 
each other out. 

Hot-Carrier Effects 

Besides varying over a design, threshold voltages in short-channel devices also have the 
tendency to drift over time. This is the result of the hot-carrier effect [Hu92], Over the last 
decades, device dimensions have been scaled down continuously, while the power supply 
and the operating voltages were kept constant. The resulting increase in the electrical field 
strength causes an increasing velocity of the electrons, which can leave the silicon and 
tunnel into the gate oxide upon reaching a high-enough energy level. Electrons trapped in 
the oxide change the threshold voltage, typically increasing the thresholds of NMOS 
devices, while decreasing the V T of PMOS transistors. For an electron to become hot, an 
electrical field of at least 10 4 V/cm is necessary. This condition is easily met in devices 
with channel lengths around or below 1 (J.m. The hot-electron phenomenon can lead to a 
long-term reliability problem, where a circuit might degrade or fail after being in use for a 
while. This is illustrated in Figure 3.36, which shows the degradation in the I-V character- 
istics of an NMOS transistor after it has been subjected to extensive operation. State-of- 
the-art MOSFET technologies therefore use specially-engineered drain and source regions 
to ensure that the peaks in the electrical fields are bounded, hence preventing carriers to 
reach the critical values necessary to become hot. The reduced supply voltage that is typi- 
cal for deep sub-micron technologies can in part be attributed to the necessity to keep hot- 
carrier effects under control. 




Figure 3.36 Hot-carrier effects cause the I-V characteristics of an NMOS transistor to degrade from 
extensive usage (from [McGaughy98]). 
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CMOS Latchup 

The MOS technology contains a number of intrinsic bipolar transistors. These are espe- 
cially troublesome in CMOS processes, where the combination of wells and substrates 
results in the formation of parasitic n-p-n-p structures. Triggering these thyristor-like 
devices leads to a shorting of the V DD and V ss lines, usually resulting in a destruction of 
the chip, or at best a system failure that can only be resolved by power-down. 

Consider the u-well structure of Figure 3.37a. The n-p-n-p structure is formed by the 
source of the NMOS, the p-substrate, the n-well and the source of the PMOS. A circuit 
equivalent is shown in Figure 3.37b. When one of the two bipolar transistors gets forward 
biased (e.g., due to current flowing through the well, or substrate), it feeds the base of the 
other transistor. This positive feedback increases the current until the circuit fails or 
burns out. 




(a) Origin of latchup (b) Equivalent circuit 



Figure 3.37 CMOS latchup. 

From the above analysis the message to the designer is clear — to avoid latchup, the 
resistances R nweU and R psubs should be minimized. This can be achieved by providing 
numerous well and substrate contacts, placed close to the source connections of the 
NMOS/PMOS devices. Devices carrying a lot of current (such as transistors in the I/O 
drivers) should be surrounded by guard rings. These circular well/substrate contacts, posi- 
tioned around the transistor, reduce the resistance even further and reduce the gain of the 
parasitic bipolars. For an extensive discussion on how to avoid latchup, please refer to 
[Weste93]. The latchup effect was especially critical in early CMOS processes. In recent 
years, process innovations and improved design techniques have all but eliminated the 
risks for latchup. 

3.3.5 SPICE Models for the MOS Transistor 

The complexity of the behavior of the short-channel MOS transistor and its many parasitic 
effects has led to the development of a wealth of models for varying degrees of accuracy 
and computing efficiency. In general, more accuracy also means more complexity and, 
hence, an increased run time. In this section, we briefly discuss the characteristics of the 
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more popular MOSFET models, and describe how to instantiate a MOS transistor in a cir- 
cuit description. 

SPICE Models 

SPICE has three built-in MOSFET models, selected by the LEVEL parameter in the 
model card. Unfortunately, all these models have been rendered obsolete by the progres- 
sion to short-channel devices. They should only be used for first-order analysis, and we 
therefore limit ourselves to a short discussion of their main properties. 

• The LEVEL 1 SPICE model implements the Shichman-Hodges model , which is 
based on the square law long-channel expressions, derived earlier in this chapter. It 
does not handle short-channel effects. 

• The LEVEL 2 model is a geometry-based model, which uses detailed device physics 
to define its equations. It handles effects such as velocity saturation, mobility degra- 
dation, and drain-induced barrier lowering. Unfortunately, including all 3D-effects 
of an advanced submicron process in a pure physics-based model becomes complex 
and inaccurate. 

• LEVEL 3 is a semi-empirical model. It relies on a mixture of analytical and empiri- 
cal expressions, and uses measured device data to determine its main parameters. It 
works quite well for channel lengths down to 1 (J.m. 

In response to the inadequacy of the built-in models, SPICE vendors and semi-con- 
ductor manufacturers have introduced a wide range of accurate, but proprietary models. A 
complete description of all those would take the remainder of this book, which is, obvi- 
ously, not the goal. We refer the interested reader to the extensive literature on this topic 
[e.g. Vladimirescu93], 

The BSIM3V3 SPICE Model 

The confusing situation of having to use a different model for each manufacturer has for- 
tunately been partially resolved by the adoption of the BSIM3v3 model as an industry- 
wide standard for the modeling of deep-submicron MOSFET transistors. The Berkeley 
Short-Channel IGFET Model (or BSIM in short) provides a model that is analytically 
simple and is based on a ‘small’ number of parameters, which are normally extracted from 
experimental data. Its popularity and accuracy make it the natural choice for all the simu- 
lations presented in this book. 

A full-fledged BSIM3v3 model (denoted as LEVEL 49) contains over 200 parame- 
ters, the majority of which are related to the modeling of second-order effects. Fortu- 
nately, understanding the intricacies of all these parameters is not a requirement for the 
digital designer. We therefore only present an overview of the parameter categories (Table 
3.6). The Bin category deserves some extra attention. Providing a single set of parameters 
that is acceptable over all possible device dimensions is deemed to be next to impossible. 
So, a set of models is provided, each of which is valid for a limited region delineated by 
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LMIN, LMAX, WMIN, and WMAX (called a bin). It is typically left to the user to select 
the correct bin for a particular transistor. 



Table 3.6 BSIM3-V3 model parameter categories, and some important parameters. 



Parameter Category 


Description 


Control 


Selection of level and models for mobility, capacitance, and noise 
LEVEL, MOBMOD, CAPMOD 


DC 


Parameters for threshold and current calculations 
VTH0, Kl, U0, VS AT, RSH, 


AC & Capacitance 


Parameters for capacitance computations 
CGS(D)0, CJ, MJ, CJSW, MJSW 


dW and dL 


Derivation of effective channel length and width 


Process 


Process parameters such as oxide thickness and doping concentrations 
TOX, XJ, GAMMA 1, NCH, NSUB 


Temperature 


Nominal temperature and temperature coefficients for various device parameters 

TNOM 


Bin 


Bounds on device dimensions for which model is valid 
LMIN, LMAX, WMIN, WMAX 


Flicker Noise 


Noise model parameters 







We refer the interested reader to the BSIM3v3 documentation provided on the web- 
site of the textbook (REFERENCE) for a complete description of the model parameters 
and equations. The LEVEL-49 models for our generic 0.25 pm CMOS process can be 
found at the same location. 

Transistor Instantiation 

The parameters that can be specified for an individual transistor are enumerated in 
Table 3.7. Not all these parameters have to be defined for each transistor. SPICE assumes 
default values (which are often zero!) for the missing factors. 



WARNING: It is hard to expect accuracy from a simulator, when the circuit description 
provided by the designer does not contain the necessary details. For instance, you must 
accurately specify the area and the perimeter of the source and drain regions of the devices 
when performing a performance analysis. Lacking this information, which is used for the 
computation of the parasitic capacitances, your transient simulation will be next to use- 
less. Similarly, it is often necessary to painstakingly define the value of the drain and 
source resistance. The NRS and NRD values multiply the sheet resistance specified in the 
transistor model for an accurate representation of the parasitic series source and drain 
resistance of each transistor. 
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Table 3.7 SPICE transistor parameters. 



Parameter Name 


Symbol 


SPICE Name 


Units 


Default Value 


Drawn Length 


L 


L 


m 


- 


Effective Width 


W 


W 


m 


- 


Source Area 


AREA 


AS 


m 2 


0 


Drain Area 


AREA 


AD 


m 2 


0 


Source Perimeter 


PERM 


PS 


m 


0 


Drain Perimeter 


PERM 


PD 


m 


0 


Squares of Source Diffusion 




NRS 


- 


1 


Squares of Drain Diffusion 




NRD 


- 


1 



Example 3.11 SPICE description of a CMOS inverter 

An example of a SPICE description of a CMOS inverter, consisting of an NMOS and a 
PMOS transistor, is given below. Transistor Ml is an NMOS device of model-type (and bin) 
nmos. 1 with its drain, gate, source, and body terminals connected to nodes nvout, nvin , 0, and 
0, respectively. Its gate length is the minimum allowed in this technology (0.25 pm). The '+’ 
character at the start of line 2 indicates that this line is a continuation of the previous one. 

The PMOS device of type pmos.l, connected between nodes nvout, nvin, nvdd, and 
nvdd (D, G, S, and B, respectively), is three times wider, which reduces the series resistance, 
but increases the parasitic diffusion capacitances as the area and perimeter of the drain and 
source regions go up. 

Finally, the Jib line refers to the file that contains the transistor models. 

Ml nvout nvin 0 0 nmos.l W=0.375U L=0.25U 
+AD=0.24P PD=1.625U AS=0.24P PS=1.625U NRS=1 NRD=1 
M2 nvout nvin nvdd nvdd pmos.l W=1.125U L=0.25U 
+AD=0.7P PD=2.375U AS=0.7P PS=2.375U NRS=0.33 NRD=0.33 
.lib 'c :\Design\Models\cmos025 .1' 



3.4 A Word on Process Variations 

The preceding discussions have assumed that a device is adequately modeled by a single 
set of parameters. In reality, the parameters of a transistor vary from wafer to wafer, or 
even between transistors on the same die, depending upon the position. This observed ran- 
dom distribution between supposedly identical devices is primarily the result of two fac- 
tors: 

1. Variations in the process parameters, such as impurity concentration densities, oxide 
thicknesses, and diffusion depths, caused by nonuniform conditions during the dep- 
osition and/or the diffusion of the impurities. These result in diverging values for 
sheet resistances, and transistor parameters such as the threshold voltage. 
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2. Variations in the dimensions of the devices, mainly resulting from the limited reso- 
lution of the photolithographic process. This causes deviations in the (W/L) ratios of 
MOS transistors and the widths of interconnect wires. 

Observe that quite a number of these deviations are totally uncorrelated. For instance, 
variations in the length of an MOS transistor are unrelated to variations in the threshold 
voltage as both are set by different process steps. Below we examine the impact on some 
of the parameters that determine the transistor current. 

• The threshold voltage V T can vary for numerous reasons: changes in oxide thick- 
ness, substrate, poly and implant impurity levels, and the surface charge. Accurate 
control of the threshold voltage is an important goal for many reasons. Where in the 
past thresholds could vary by as much as 50%, state-of-the-art digital processes 
manage to control the thresholds to within 25-50 mV. 

• k' n : The main cause for variations in the process transconductance is changes in 
oxide thickness. Variations can also occur in the mobility but to a lesser degree. 

• Variations in W and L. These are mainly caused by the lithographic process. 
Observe that variations in W and L are totally uncorrelated since the first is deter- 
mined in the field-oxide step, while the second is defined by the polysilicon defini- 
tion and the source and drain diffusion processes. 

The measurable impact of the process variations may be a substantial deviation of 
the circuit behavior from the nominal or expected response, and this could be in either 
positive or negative directions. This poses the designer for an important economic 
dilemma. Assume, for instance, that you are supposed to design a microprocessor running 
at a clock frequency of 500 MHz. It is economically important that the majority of the 
manufactured dies meet that performance requirement. One way to achieve that goal is to 
design the circuit assuming worst-case values for all possible device parameters. While 
safe, this approach is prohibitively conservative and results in severely overdesigned and 
hence uneconomical circuits. 

To help the designer make a decision on how much margin to provide, the device 
manufacturer commonly provides fast and slow device models in addition to the nominal 
ones. These result in larger or smaller currents than expected, respectively. 

Example 3.12 MOS Transistor Process Variations 

To illustrate the possible impact of process variations on the performance of an MOS device, 
consider a minimum-size NMOS device in our generic 0.25 pm CMOS process. A later chap- 
ter will establish that the speed of the device is proportional to the drain current that can be 
delivered. 

Assume initially that V cs = V DS = 2.5 V. From earlier simulations, we know that this 
produces a drain current of 220 pA. The nominal model is now replaced by the fast and slow 
models, that modify the length and width (±10%), threshold (±60 mV), and oxide thickness 
(±5%) parameters of the device. Simulations produce the following data: 

Fast: I d = 265 pA: +20% 

Slow: I d =182 pA: -17% 

Let us now proceed one step further. The supply voltage delivered to a circuit is by no means 
a constant either. For instance, the voltage delivered by a battery can drop off substantially 
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towards the end of its lifetime. In practice, a variation in 10% of the supply voltage may well 
be expected. 

Fast + V dd = 2.75 V: I d = 302 pA: +37% 

Slow + V dd = 2.25 V: I d = 155 pA: -30% 

The current levels and the associated circuit performance can thus vary by almost 100% 
between the extreme cases. To guarantee that the fabricated circuits meet the performance 
requirements under all circumstances, it is necessary to make the transistor 42% 
(=220pA/155pA) wider then would be required in the nominal case. This translates into a 
severe area penalty. 









Fortunately, these worst- (or best-) case conditions occur only very rarely in reality. 
The probability that all parameters assume their worst-case values simultaneously is very 
low, and most designs will display a performance centered around the nominal design. 
The art of the design for manufacturability is to center the nominal design so that the 
majority of the fabricated circuits (e.g., 98%) will meet the performance specifications, 
while keeping the area overhead minimal. 

Specialized design tools to help meet this goal are available. For instance, the Monte 
Carlo analysis approach [Jensen91] simulates a circuit over a wide range of randomly cho- 
sen values for the device parameters. The result is a distribution plot of design parameters 
(such as the speed or the sensitivity to noise) that can help to determine if the nominal 
design is economically viable. Examples of such distribution plots, showing the impact of 
variations in the effective transistor channel length and the PMOS transistor thresholds on 
the speed of an adder cell, are shown in Figure 3.38. As can be observed, technology vari- 
ations can have a substantial impact on the performance parameters of a design. 





Leg (in mm) V Tp (V) 

Figure 3.38 Distribution plots of speed of adder circuit as a function of varying device parameters, as obtained 
by a Monte Carlo analysis. The circuit is implemented in a 2 pm (nominal) CMOS technology {courtesy of Eric 
Bo skin, UCB, and ATM EL corp.). 



One important conclusion from the above discussion is that SPICE simulations 
should be treated with care. The device parameters presented in a model represent average 
values, measured over a batch of manufactured wafers. Actual implementations are bound 
to differ from the simulation results, and for reasons other than imperfections in the mod- 
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eling approach. Be furthermore aware that temperature variations on the die can present 
another source for parameter deviations. Optimizing an MOS circuit with SPICE to a res- 
olution level of a picosecond or a microvolt is clearly a waste of effoit. 



3.5 Perspective: Technology Scaling 

Over the last decades, we have observed a spectacular increase in integration density and 
computational complexity of digital integrated circuits. As already argued in the introduc- 
tion, applications that were considered implausible yesterday are already forgotten today. 
Underlying this revolution are the advances in device manufacturing technology that 
allow for a steady reduction of the minimum feature size such as the minimum transistor 
channel length realizable on a chip. To illustrate this point, we have plotted in Figure 3.39 
the evolution of the (average) minimum device dimensions starting from the 1960s and 
projecting into the 21st century. We observe a reduction rate of approximately 13% per 
year, halving every 5 years. Another interesting observation is that no real sign of a slow- 
down is in sight, and that the breathtaking pace will continue in the foreseeable future. 




Year 



Figure 3.39 Evolution of (average) minimum channel 
length of MOS transistors over time. Dots represent 
observed or projected (2000 and beyond) values. The 
continuous line represents a scaling scenario that 
reduces the minimum feature with a factor 2 every 5 
years. 



A pertinent question is how this continued reduction in feature size influences the 
operating characteristics and properties of the MOS transistor, and indirectly the critical 
digital design metrics such as switching frequency and power dissipation. A first-order 
projection of this behavior is called a scaling analysis, and is the topic of this section. In 
addition to the minimum device dimension, we have to consider the supply voltage as a 
second independent variable in such a study. Different scaling scenarios result based on 
how these two independent variables are varied with respect to each other [Dennard74, 
Baccarani84], 

Three different models are studied in Table 3.8. To make the results tractable, it is 
assumed that all device dimensions scale by the same factor S (with S > 1 for a reduction 
in size). This includes the width and length of the transistor, the oxide thickness, and the 
junction depths. Similarly, we assume that all voltages, including the supply voltage and 
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the threshold voltages, scale by a same ratio U. The relations governing the scaling behav- 
ior of the dependent variables are tabulated in column 2. Observe that this analysis only 
considers short-channel devices with a linear dependence between control voltage and sat- 
uration current (as expressed by Eq. (3.39)). We discuss each scenario in turn. 



Table 3.8 Scaling scenarios for short-channel devices. 



Parameter 


Relation 


Full Scaling 


General Sealing 


Fixed-Voltage Scaling 


W U t„ 




1/5 


1/5 


1/5 


IW Tf 




1/5 


1/U 


1 


Nsub 


v/w de y 


5 


s 2 /u 


S 2 


Areal Device 


WL 


1/5 2 


1/5 2 


1/5 2 




1 !t„ 


5 


5 


5 


r 

gate 


C ox WL 


1/5 


1/5 


1/5 


Kv k P 


C„ X W/L 


5 


5 


5 


I sat 


c ox wv 


1/5 


1/U 


1 


Current Density 


I s JArea 


5 


s 2 /u 


S 2 


Ron 


v/i SM 


1 


1 


1 


Intrinsic Delay 


R C 

I ^orr~'gate 


1/5 


1/5 


1/5 


p 


L,v 


1/5 2 


1/c/- 


1 


Power Density 


P/Area 


1 


s 2 /u 2 


5 2 



Full Scaling (Constant Electrical Field Scaling) 

In this ideal model, voltages and dimensions are scaled by the same factor S. The goal is to 
keep the electrical field patterns in the scaled device identical to those in the original 
device. Keeping the electrical fields constant ensures the physical integrity of the device 
and avoids breakdown or other secondary effects. This scaling leads to greater device den- 
sity (Area), higher performance ( Intrinsic Delay), and reduced power consumption (P). 
The effects of full scaling on the device and circuit parameters are summarized in the third 
column of Table 3.8. We use the intrinsic time constant, which is the product of the gate 
capacitance and the on-resistance, as a measure for the performance. The analysis shows 
that the on-resistance remains constant due to the simultaneous scaling of voltage swing 
and current level. The performance improved is solely due to the reduced capacitance. The 
results clearly demonstrate the beneficial effects of scaling — the speed of the circuit 
increases in a linear fashion, while the power/gate scales down quadratically! 6 

6 Some assumptions were made when deriving this table: 

1 . It is assumed that the carrier mobilities are not affected by the scaling. 

2. The substrate doping N sub is scaled so that the maximum depletion-layer width is reduced by a 
factor 5. 

3. It is furthermore assumed that the delay of the device is mainly determined by the intrinsic 
capacitance (the gate capacitance) and that other device capacitances, such as the diffusion 
capacitances, scale appropriately. This assumption is approximately true for the full-scaling 
case, but not for fixed-voltage scaling, where c diff scales as 1 / *JS . 
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10 10 
Minimum Feature Size (^irn) 



Figure 3.40 Evolution of min and max supply- 
voltage in digital integrated circuits as a function 
of feature size. All values for 0.15 micron and 
below are projected. 




Fixed-Voltage Scaling 

In reality, full scaling is not a feasible option. First of all, to keep new devices compatible 
with existing components, voltages cannot be scaled arbitrarily. Having to provide for 
multiple supply voltages adds considerably to the cost of a system. As a result, voltages 
have not been scaled down along with feature sizes, and designers adhere to well-defined 
standards for supply voltages and signal levels. As is illustrated in Figure 3.40, 5 V was 
the de facto standard for all digital components up to the early 1990s, and a. fixed-voltage 
scaling model was followed. 

Only with the introduction of the 0.5 (J.m CMOS technology did new standards such 
as 3.3 V and 2.5 V make an inroad. Today, a closer tracking between voltage and device 
dimension can be observed. The reason for this change in operation model can partially be 
explained with the aid of the fixed-voltage scaling model, summarized in the fifth column 
of Table 3.8. In a velocity-saturated device, keeping the voltage constant while scaling the 
device dimensions does not give a performance advantage over the full-scaling model, but 
instead comes with a major power penalty. The gain of an increased current is simply off- 
set by the higher voltage level, and only hurts the power dissipation. This scenario is very 
different from the situation that existed when transistors were operating in the long-chan- 
nel mode, and the current was a quadratic function of the voltage fas per Eq. (3.29)). 
Keeping the voltage constant under these circumstances gives a distinct performance 
advantage, as it causes a net reduction in on-resistance. 

While the above argumentation offers ample reason to scale the supply voltages 
with the technology, other physical phenomena such as the hot-carrier effect and oxide 
breakdown also contributed to making the fixed-voltage scaling model unsustainable. 



Problem 3.2 Scaling of Long-channel Devices 

Demonstrate that for a long-channel transistor, full-voltage scaling results in a reduction 
of the intrinsic delay with a factor S 2 , while increasing the power dissipation/device by S. 
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Reconstruct Table 3.8 assuming that the current is a quadratic function of the voltage (Eq. 
(3.29)). 



WARNING: The picture painted in the previous section represents a first-order model. 
Increasing the supply voltage still offers somewhat of a performance benefit for short- 
channel transistors. This is apparent in Figure 3.27 and Table 3.3, which show a reduction 
of the equivalent on-resistance with increasing supply voltage — even for the high voltage 
range. Yet, this effect, which is mostly due to the channel-length modulation, is secondary 
and is far smaller than what would be obtained in case of long-channel devices. 

The reader should keep this warning in the back of his mind throughout this scaling 
study. The goal is to discover first-order trends. This implies ignoring second-order effects 
such as mobility-degradation, series resistance, etc. 



General Scaling 

We observe in Figure 3.40 that the supply voltages, while moving downwards, are not 
scaling as fast as the technology. For instance, for the technology scaling from 0.5 pm to 
0.1 pm, the maximum supply-voltage only reduces from 5 V to 1.5 V. The obvious ques- 
tion is why not to stick to the full-scaling model, when keeping the voltage higher does not 
yield any convincing benefits? This departure is motivated by the following argumenta- 
tion: 



• Some of the intrinsic device voltages such as the silicon bandgap and the built-in 
junction potential, are material parameters and cannot be scaled. 

• The scaling potential of the transistor threshold voltage is limited. Making the 
threshold too low makes it difficult to turn off the device completely. This is aggra- 
vated by the large process variation of the value of the threshold, even on the same 
wafer. 

Therefore, a more general scaling model is needed, where dimensions and voltages 
are scaled independently. This general scaling model is shown in the fourth column of 
Table 3.8. Here, device dimensions are scaled by a factor 5, while voltages are reduced by 
a factor U. When the voltage is held constant, {7=1, and the scaling model reduces to the 
fixed-voltage model. Note that the general-scaling model offers a performance scenario 
identical to the full- and the fixed scaling, while its power dissipation lies between the two 
models (for S > U> 1). 

Verifying the Model 

To summarize this discussion on scaling, we have combined in Table 3.9 the characteris- 
tics of some of the most recent CMOS processes and projections on some future ones. 
Observe how the operating voltages are being continuously reduced with diminishing 
device dimensions. As predicted by the scaling model, the maximum drive current 
remains approximately constant. Maintaining this level of drive in the presence of a 
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reduced supply voltage requires an aggressive lowering of the threshold voltage, which 
translates in a rapid increase of the sub-threshold leakage current. 



Table 3.9 MOSFET technology projection for high performance logic (from [SIA01]). 



Year of Introduction 


2001 


2003 


2005 


2007 


2010 


2013 


2016 


Drawn channel length (nm) 


90 


65 


45 


35 


25 


18 


13 


Physical channel length (nm) 


65 


45 


32 


25 


18 


13 


9 


Gate oxide (nm) 


2.3 


2.0 


1.9 


1.4 


1.2 


1.0 


0.9 


T dd (V) 


1.2 


1.0 


0.9 


0.7 


0.6 


0.5 


0.4 


NMOS I Dsat (nA/pm) 


900 


900 


900 


900 


1200 


1500 


1500 


NMOS I, eak qtA/pm) 


0.01 


0.07 


0.3 


1 


3 


7 


10 



From the above, it is reasonable to conclude that both integration density and perfor- 
mance will continue to increase. The obvious question is for how long? Experimental 25 
nm CMOS devices have proven to be operational in the laboratories and to display current 
characteristics that are surprisingly close to present-day transistors. These transistors, 
while working along similar concepts as the current MOS devices, look very different 
from the structures we are familiar with, and require some substantial device engineering. 
For instance. Figure 3.41 shows a potential transistor structure, the Berkeley FinFET dual- 
gated transistor, which has proven to be operational up to very small channel lengths. 




Figure 3.41 FinFET dual-gated transistor 
with 25 nm channel length [Hu99] . 



Another option is the vertical transistor. Even while the addition of many metal layers has 
turned the integrated circuit into a truly three-dimensional artifact, the transistor itself is 
still mostly laid out in a horizontal plane. This forces the device designer to jointly opti- 
mize packing density and performance parameters. By rotating the device so that the drain 
ends up on top, and the source at the bottom, these concerns are separated: packing density 
still is dominated by horizontal dimensions, while performance issues are mostly deter- 
mined by vertical spacings (Figure 3.42). Operational devices of this type have been fabri- 
cated with channel lengths substantially below 0.1 pm. [EagleshamOO] . 

Integrated circuits integrating more then one billion transistors clocked at speeds of 
tens of GHz’s hence seem to be well under way. Whether this will actually happen is an 
open question. Even though it might be technologically feasible, other parameters have an 
equal impact on the feasibility of such an undertaking. A first doubt is if such a part can be 
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Figure 3.42 Vertical transistor with dual gates. The photo on the right shows an enlarged view of the channel area. 



manufactured in an economical way. Current semiconductor plants cost over $5 billion, 
and this price is expected to rise substantially with smaller feature sizes. Design consider- 
ations also play a role. Power consumption of such a component might be prohibitive. The 
growing role of interconnect parasitics might put an upper bound on performance. Finally, 
system considerations might determine what level of integration is really desirable. All in 
all, it is obvious that the design of semiconductor circuits still faces an exciting future. 






3.6 Summary 

In this chapter, we have presented a a comprehensive overview of the operation of the 
MOSFET transistor, the semiconductor device at the core of virtually all contemporary 
digital integrated circuits. Besides an intuitive understanding of its behavior, we have pre- 
sented a variety of modeling approaches ranging from simple models, useful for a first- 
order manual analysis of the circuit operation, to complex SPICE models. These models 
will be used extensively in later chapters, where we look at the fundamental building 
blocks of digital circuits. We started off with a short discussion of the semiconductor 
diode, one of the most dominant parasitic circuit elements in CMOS designs. 

• The static behavior of the junction diode is well described by the ideal diode equa- 
tion that states that the current is an exponential function of the applied voltage bias. 

• In reverse-biased mode, the depletion-region space charge of the diode can be mod- 
eled as a non-linear voltage-dependent capacitance. This is particularly important as 
the omnipresent source-body and drain-body junctions of the MOS transistors all 
operate in this mode. A linearized large-scale model for the depletion capacitance 
was introduced for manual analysis. 

• The MOS(FET) transistor is a voltage-controlled device, where the controlling gate 
terminal is insulated from the conducting channel by a Si0 2 capacitor. Based on the 
value of the gate-source voltage with respect to a threshold voltage V T . three opera- 
tion regions have been identified: cut-off, linear, and saturation. One of the most 
enticing properties of the MOS transistor, which makes it particularly amenable to 
digital design, is that it approximates a voltage-controlled switch: when the control 
voltage is low, the switch is nonconducting (open); for a high control voltage, a con- 



A- 
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ducting channel is formed, and the switch can be considered closed. This two-state 
operation matches the concepts of binary digital logic. 

• The continuing reduction of the device dimensions to the submicron range has intro- 
duced some substantial deviations from the traditional long-channel MOS transistor 
model. The most important one is the velocity saturation effect, which changes the 
dependence of the transistor current with respect to the controlling voltage from 
quadratic to linear. Models for this effect as well as other second-order parasitics 
have been introduced. One particular effect that is gaining in importance is the sub- 
threshold conduction , which causes devices to conduct current even when the con- 
trol voltage drops below the threshold. 

• The dynamic operation of the MOS transistor is dominated by the device capacitors. 
The main contributors are the gate capacitance and the capacitance formed by the 
depletion regions of the source and drain junctions. The minimization of these 
capacitances is the prime requirement in high-performance MOS design. 

• SPICE models and their parameters have been introduced for all devices. It was 
observed that these models represent an average behavior and can vary over a single 
wafer or die. 

• The MOS transistor is expected to dominate the digital integrated circuit scene for 
the next decade. Continued scaling will lead to device sizes of approximately 0.07 
micron by the year 2010, and logic circuits integrating more than 1 billion transis- 
tors on a die. 



3.7 To Probe Further 

Semiconductor devices have been discussed in numerous books, reprint volumes, tutori- 
als, and journal articles. The IEEE Journal on Electron Devices is the premier journal, 
where most of the state-of-the-art devices and their modeling are discussed. Another valu- 
able resource are the proceedings of the International Electron Devices Meeting (IEDM). 
The books (such as [Streetman95] and [Pierret96]) and journal articles referenced below 
contain excellent discussions of the semiconductor devices of interest or refer to specific 
topics brought up in the course of this chapter. 
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Chapter 3 Problem Set 



Chapter 3 
PROBLEMS 



For all problems, use the device parameters provided in Chapter 3 (Tables 3.2 and 3.5) and the inside 
back book cover, unless otherwise mentioned. Also assume T = 300 K by default. 

1. [E, SPICE, 3.2.2] 

a. Consider the circuit of Figure 0.1. Using the simple model, with V Don = 0.7 V, solve for I D . 

b. Find/^, and V D using the ideal diode equation. Use I s = 10~ 14 A and T= 300 K. 

c. Solve for V m , V m , and fusing SPICE. 

d. Repeat parts b and c using I s = 10~ 16 A T= 300K, and I s = 10~ 14 A, T= 350 K. 




Figure 0.1 Resistor diode circuit. 



2. [M, None, 3.2.3] For the circuit in Figure 0.2, V s = 3.3 V. Assume A D = 12 pm 2 , (]) 0 = 0.65 V, 

and m = 0.5. N A = 2.5 E16 and N D = 5 E15. 

a. Find I D and V D . 

b. Is the diode forward- or reverse-biased? 

c. Find the depletion region width, Wp of the diode. 

d. Use the parallel-plate model to find the junction capacitance, Cj. 

e. Set V s = 1.5 V. Again using the parallel-plate model, explain qualitatively why Cj 
increases. 




Figure 0.2 Series diode circuit 



3. [E, None, 3.3.2] Figure 0.3 shows NMOS and PMOS devices with drains, source, and gate 

ports annotated. Determine the mode of operation (saturation, linear, or cutoff) and drain cur- 
rent I D for each of the biasing configurations given below. Verify with SPICE. Use the follow- 
ing transistor data: NMOS: k' n = 115pA/V 2 , V m = 0.43 V, X = 0.06 V 4 , PMOS: k' p = 30 
pA/V 2 , V w = -0.4 V, A, = -0. 1 V ' ■ Assume ( W/L) = 1 . 

a. NMOS: V GS = 2.5 V, V DS = 2.5 V. PMOS: V GS = -0.5 V, V DS = -1 .25 V. 

b. NMOS: V GS = 3.3 V, V DS = 2.2 V. PMOS: V GS = -2.5 V, V DS = -1 .8 V. 

c. NMOS: V GS = 0.6 V, V DS = 0. 1 V. PMOS: V GS = -2.5 V, V DS = -0.7 V. 

[E, SPICE, 3.3.2] Using SPICE plot the I-V characteristics for the following devices. 



4. 
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1 S D Figure 0.3 NMOS and PMOS devices. 

a. NMOS W= 1.2pm, L = 0.25pm 

b. NMOS W= 4.8pm, L = 0.5pm 

c. PMOS W= 1.2 pm, L = 0.25 pm 

d. PMOS W= 4.8 pm, L = 0.5 pm 

5. [E, SPICE, 3.3.2] Indicate on the plots from problem 4. 

a. the regions of operation. 

b. the effects of channel length modulation. 

c. Which of the devices are in velocity saturation? Explain how this can be observed on the /- 
V plots. 

6. [M, None, 3.3.2] Given the data in Table 0.1 for a short channel NMOS transistor with 
V DSAT = 0.6 V and k' =100 pA/V 2 , calculate V w , y, X, 2|<y, and W/L: 



Table 0.1 Measured NMOS transistor data 





Vos 


I/M' 


P/M 


Id V 


1 


2.5 


1.8 


0 


1812 


2 


2 


1.8 


0 


1297 


3 


2 


2.5 


0 


1361 


4 


2 


1.8 


-1 


1146 


5 


2 


1.8 


-2 


1039 



7. [E, None, 3.3.2] Given Table 0.2 ,the goal is to derive the important device parameters from 

these data points. As the measured transistor is processed in a deep-submciron technology, the 
‘unified model’ holds. From the material constants, we also could determine that the satura- 
tion voltage V DSAT equals -IV. You may also assume that -2<1> F = -0.6V. 

NOTE: The parameter values on Table 3.3 do NOT hold for this problem. 

a. Is the measured transistor a PMOS or an NMOS device? Explain your answer. 

b. Determine the value of Vto- 

c. Determine y. 

d. Determine X. 
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e. Given the obtained answers, determine for each of the measurements the operation region 
of the transistor (choose from cutoff, resistive, saturated, and velocity saturated). Annotate 
your finding in the right-most column of the above. 



Table 0.2 Measurements taken from the MOS device, at different terminal voltages. 



Measurement 


VGS 


VDS 


VSB 


ID (pA) 


Operation 


number 


(V) 


(V) 


(V) 


Region? 


1 


-2.5 


-2.5 


0 


-84.375 




2 


1 


1 


0 


0.0 




3 


-0.7 


-0.8 


0 


-1.04 




4 


-2.0 


-2.5 


0 


-56.25 




5 


-2.5 


-2.5 


-1 


-72.0 




6 


-2.5 


-1.5 


0 


-80.625 




1 


-2.5 


-0.8 


0 


-66.56 





8. [M, None, 3.3.2] An NMOS device is plugged into the test configuration shown below in Fig- 

ure 0.4. The input V in =2V. The current source draws a constant current of 50 pA. R is a vari- 
able resistor that can assume values between lOkQ and 30 kfl Transistor Ml experiences 
short channel effects and has following transistor parameters: k ’ = 1 10* 10" 6 V/A 2 , V T = 0.4 , 
and V DSAT = 0.6V. The transistor has a W/L = 2.5p/0.25p. For simplicity body effect and 
channel length modulation can be neglected, i.e X=0, y=0. . 



V,=2V. 



Ml 



W/L = 2.5p/0.25p 



I = 50pA 



Figure 0.4 Test configuration for the NMOS device. 



a. When R =10kfi find the operation region, V D and V s . 

b. When R= 30kfi again detennine the operation region V D , V s 

c. For the case of R = lOkfi, would V s increase or decrease if X ^ 0 . Explain qualitatively 

9. [M, None, 3.3.2] Consider the circuit configuration of Figure 0.5. 
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a. Write down the equations (and only those) which are needed to detennine the voltage at 
node X. Do NOT plug in any values yet. Neglect short channel effects and assume that X p 
= 0 . 

b. Draw the (approximative) load lines for both MOS transistor and resistor. Mark some of 
the significant points. 

c. Determine the required width of the transistor (for L = 0.25 pm) such that X equals 1 .5 V. 

d. We have, so far, assumed that M l is a long-channel device. Redraw the load lines assum- 
ing that M x is velocity-saturated. Will the voltage at X rise or fall? 

2.5 V 

^ R x = 20 kD 

Mj 

1- Figure 0.5 MOS circuit. 




10 . [M, None, 3.3.2] The circuit of Figure 0.6 is known as a source-follower configuration. It 

achieves a DC level shift between the input and output. The value of this shift is determined 
by the current I 0 . Assume y = 0.4, 2|4> f | = 0.6 V, V m = 0.43 V, k! = 115 pA/V 2 , and X = 0. The 
NMOS device has W/L = 5.4p/1.2p such that the short channel effects are not observed. 

a. Derive an expression giving V t as a function of V a and Vj(V 0 ). If we neglect body effect, 
what is the nominal value of the level shift performed by this circuit. 

b. The NMOS transistor experiences a shift in F r due to the body effect. Find V r as a function 
of V 0 for V 0 ranging from 0 to 1.5V with 0.25 V intervals. Plot V T vs. V Q . 

c. Plot V 0 vs. Vj as V 0 varies from 0 to 1.5 V with 0.25 V intervals. Plot two curves: one 
neglecting the body effect and one accounting for it. How does the body effect influence 
the operation of the level converter? At F^(body effect) = 1.5 V, find V 0 (ideal) and, thus, 
detennine the maximum error introduced by body effect. 




Figure 0.6 Source-follower level converter. 



11 . [M, SPICE, 3.3.2] Problem 1 1 uses the MOS circuit of Figure 0.7. 

a. Plot V out vs. V jn with V in varying from 0 to 2.5 volts (use steps of 0.5V). V DD = 2.5 V. 

b. Repeat a using SPICE. 

c. Repeat a and b using a MOS transistor with (W/L) = 4/1. Is the discrepancy between 
manual and computer analysis larger or smaller. Explain why. 
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Figure 0.7 



MOS circuit. 



12. [E, None, 3.3.2]Below in Figure 0.8 is an I-V transfer curve for an NMOS transistor. In this 

problem, the objective is to use this I-V curve to obtain information about the transistor. The 
transistor has (W/L)=(lp/lp). It may also be assumed that velocity saturation does not play a 
role in this example. Also assume -2© F = 0.6V. Using Figure 0.8 determine the following 
parameters: device V T0 , y, X. 



320U 

300U 

280 u 

26011 

240u 

220u 

200U 



I80u 



I60U 



I20U 

IOOU 

80u 

60 u 

40u 

20U 

0 



Vgs=2.5V, Vbs=0.OV 



Vgs=2.5V,Vbs=-1.0V 



Vgs=2.OV,Vbs=0.OV 



Vgs=2.0V,Vbs=-1.0V 



Vgs=1.5V,Vbs=O.OV 



Vgs=1.5V,Vbs=-1.0V 



Vgs=1.OV,Vbs=0.0V 

Vgs=1.0V,Vbs=-1.0V 



1.6 1.8 



2.2 2.4 






Figure 0.8 I-V curves 
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13 . [E, None , 3.3.2]The curves below in Figure 0.9 represent the gate voltage(V GS ) vs. drain cur- 

rent (I DS ) of two NMOS devices which are on the same die and operate in subthreshold 
region. Due to process variations on the same die the curves do not overlap. 




Figure 0.9 Subthreshold 
current curves. Difference is 
due to process variations 



Also assume that the transistors are within the same circuit configurations as Figure 0.10 
in If the in put voltages are both = 0.2V. What would be the respective durations to dis- 
charge the load of C L = lpF attached to the drains of these devices. 



V, 



V,' 1 



sH 



Ml 



= 4 = C L = lpF 



1 



Figure 0.10 The circuit for testing the time to 
discharge the load capacitance through a device 
operating in subthreshold region. 



14 . [M, None, 3.3.2] Short-channel effects: 

a. Use the fact that current can be expressed as the product of the carrier charge per unit 
length and the velocity of carriers (I DS = Qv] to derive I DS as a function of W, C ox , V GS - Vp, 
and carrier velocity v. 

b. For a long-channel device, the carrier velocity is the mobility times the applied electric 
field. The electrical field, which has dimensions of V/m, is simply ( V GS — V T ) / 2 L. Derive 
I DS for a long-channel device. 

c. From the equation derived in a, find I DS for a short-channel device in terns of the maxi- 
mum carrier velocity, v max . 

Based on the results of b and c describe the most important differences between short- 
channel and long-channel devices. 
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15 . [C, None, 3.3.2] Another equation, which models the velocity-saturated drain current of an 

MOS transistor is given by 



Idsat 1 +(V GS -V,)/{E sat L){ 2 V t ) 2 

Using this equation it is possible to see that velocity saturation can be modeled by a MOS 
device with a source-degeneration resistor (see Figure 0.11). 

a. Find the value of R s such that I DS at(V gs , ^ds) for the composite transistor in the figure 
matches the above velocity-saturated drain current equation. Hint: the voltage drop across 
R s is typically small. 

b. Given E sat = 1.5 V/pm and K = p 0 C ox = 20 pA/V 2 , what value of R s is required to model 
velocity saturation. How does this value depend on W and LI 

V D 



Vs 



Figure 0.11 Source-degeneration model of 
V s velocity saturation. 




16 . [E, None, 3.3.2] The equivalent resistances of two different devices need to be computed. 

a. First, consider the fictive device whose I-V characteristic is given in Figure 0.12. Constant 
k has the dimension of S (orl/fi). V 0 is a voltage characteristic to the device. Calculate the 
equivalent resistance for an output voltage transition from 0 to 2V 0 by integrating the 
resistance as a function of the voltage. 



\ 1(A) i=k*V*e (v/v 0 ) 



V(V) 





Figure 0.12 Fictive device 
whose equivalent resistance is to 
be calculated. 



b. Next, obtain the resistance equation 3.43 using the Figure 0.13. Assuming the V GS is kept 
at V DD , Calculate the Req as output (V DS ) transitions from V DD to V DD /2.(Figure 0.13). 
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Hint: Make sure you use the Short channel Unified MOS Model equations. Hint: You 
will need to use the expansion. ln(l +x) ~x - t/ 2 + x 3 /3 





Ved— >V n/2 



Figure 0.13 The equivalent resistance is to 
be computed for the H — >L transition. 



17 . [M, None, 3.3.3] Compute the gate and diffusion capacitances for transistor Ml of Figure 0.7. 
Assume that drain and source areas are rectangular, and are 1 pm wide and 0.5 pm long. Use 
the parameters of Example 3.5 to detennine the capacitance values. Assume rrij = 0.5 and 
m Jsw = 0.44. Also compute the total charge stored at node hi, for the following initial condi- 
tions: 

a. V in = 2.5 V, V out = 2.5 V, 0.5 V, and 0 V. 

b. V in = 0 V, V out = 2.5 V, 0.5 V, and 0 V. 

18 . [E, None, 3.3.3]Consider a CMOS process with the following capacitive parameters for the 
NMOS transistor: C GS0 , C GD0 , C ox , Cj, nij Cj sw , mj sw , and PB, with the lateral diffusion equal 
to L d . The MOS transistor Ml is characterized by the following parameters: W, L, AD, PD, 
AS, PS. 



^dd 




Figure 0.14 Circuit to measure total input 
capacitance 



a. Consider the configuration of Figure 0.14. V DD is equal to V T (the threshold voltage of the 
transistor) Assume that the initial value of V e equals 0. A current source with value I in is 
applied at time 0. Assuming that all the capacitance at the gate node can be lumped into a 
single, grounded, linear capacitance C T , derive an expression for the time it will take for 
V g to reach 2 V T 

b. The obvious question is now how to compute C x . Among, C db , C sb , C gs , C gd , C gb which of 
these parasatic capacitances of the MOS transistor contribute to C x . For those that contrib- 
ute to C x write down the expression that determines the value of the contribution. Use only 
the parameters given above. If the transistor goes through different operation regions and 
this impacts the value of the capacitor, determine the expression of the contribution for 
each region (and indicate the region). 
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c. Consider now the case depicted in Figure 0.15. Asssume that V d is initially at 0 and we 
want to charge it up to 2 V T . Again among, C db , C sb , C gs , C gd , C gb which device capaci- 
tances contribute to the total drain capacitance? Once again, make sure you differentiate 
between different operation regions.. 

V d 




Circuit to measure the total drain 



19. [M, None, 3.3.3]For the NMOS transistor in Figure 0.16, sketch the voltages at the source 

and at the drain as a function of time. Initially, both source and drain are at +2.5 volts. Note 
that the drain is open circuited. The 10 pA current source is turned on at t=0. Device parame- 
ters:W/L eff = 125p/0.25p ; pC 0X = 100 pA/V 2 ; C 0X = 6fF/p 2 ;C OL (per width) = 0.3 fF/p ; C sb = 
100 fF; Cdb = 100fF;V DSAT =1 V. 

HINT:Do not try to solve this analytically. Just use a qualitative analysis to derive the dif- 
ferent operation modes of the circuit and the devices. 




Cdb 



Csb 



Figure 0.16 Device going through different 
operation regions over time 



20. [C, SPICE, 3.4] Though impossible to quantify exactly by hand, it is a good idea to understand 
process variations and be able to at least get a rough estimate for the limits of their effects. 

a. For the circuit of Figure 0.7, calculate nominal, minimum, and maximum currents in the 
NMOS device with V in = 0 V, 2.5 V and 5 V. Assume 3a variations in V n of 25 mV, in k' of 
15%, and in lithographic etching of 0.15 pm. 

b. Analyze the impact of these current variations on the output voltage. Assume that the load 
resistor also can vary by 10%. Verify the result with SPICE. 

21. [E, None, 3.5] A state-of-the-art, synthesizable, embedded microprocessor consumes 
0.4mW/MHz when fabricated using a 0.18 pm process. With typical standard cells (gates), 
the area of the processor is 0.7 mm2. Assuming a 100 Mhz clock frequency, and 1 .8 V power 
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supply. Assume short channel devices, but ignore second order effects like mobility degrada- 
tion, series resistance, etc. 

a. Using fixed voltage scaling and constant frequency, what will the area, power consump- 
tion, and power density of the same processor be, if scaled to 0. 12 pm technology, assum- 
ing the same clock frequency? 

b. If the supply voltage in the scaled 0.12 pm part is reduced to 1.5 V what will the power 
consumption and power density be? 

c. How fast could the scaled processor in Part (b) be clocked? What would the power and 
power density be at this new clock frequency? 

d. Power density is important for cooling the chip and packaging. What would the supply 
voltage have to be to maintain the same power density as the original processor? 

22. The superscalar, superpipelined, out-of-order executing, highly parallel, fully x86 compatible 

JMRII microprocessor was fabricated in a 0.25 m technology and was able to operate at 100 

MHZ, consuming 10 watts using a 2.5 V power supply. 

a. Using fixed voltage scaling, what will the speed and power consumption of the same pro- 
cessor be if scaled to 0.1 pm technology? 

b. If the supply voltage on the 0. 1 pm part were scaled to 1 .0 V, what will the power con- 
sumption and speed be? 

c. What supply should be used to fix the power consumption at 10 watts? At what speed 
would the processor operate? 
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4.1 Introduction 

Throughout most of the past history of integrated circuits, on-chip interconnect wires were 
considered to be second class citizens that had only to be considered in special cases or 
when performing high-precision analysis. With the introduction of deep-submicron semi- 
conductor technologies, this picture is undergoing rapid changes. The parasitics effects 
introduced by the wires display a scaling behavior that differs from the active devices such 
as transistors, and tend to gain in importance as device dimensions are reduced and circuit 
speed is increased. In fact, they start to dominate some of the relevant metrics of digital 
integrated circuits such as speed, energy-consumption, and reliability. This situation is 
aggravated by the fact that improvements in technology make the production of ever- 
larger die sizes economically feasible, which results in an increase in the average length of 
an interconnect wire and in the associated parasitic effects. A careful and in-depth analysis 
of the role and the behavior of the interconnect wire in a semiconductor technology is 
therefore not only desirable, but even essential. 



4.2 A First Glance 

The designer of an electronic circuit has multiple choices in realizing the interconnections 
between the various devices that make up the circuit. State-of-the-art processes offer mul- 
tiple layers of Aluminum, and at least one layer of polysilicon. Even the heavily doped n + 
or p + layers, typically used for the realization of source and drain regions, can be 
employed for wiring purposes. These wires appear in the schematic diagrams of electronic 
circuits as simple lines with no apparent impact on the circuit performance. In our discus- 
sion on the integrated-circuit manufacturing process, it became clear that this picture is 
overly simplistic, and that the wiring of today’s integrated circuits forms a complex geom- 
etry that introduces capacitive, resistive, and inductive parasitics. All three have multiple 
effects on the circuit behavior. 

1. An increase in propagation delay, or, equivalently, a drop in performance. 

2. An impact on the energy dissipation and the power distribution. 

3. An introduction of extra noise sources, which affects the reliability of the circuit. 

A designer can decide to play it safe and include all these parasitic effects in her analysis 
and design optimization process. This conservative approach is non-constructive and even 
unfeasible. First of all, a “complete” model is dauntingly complex and is only applicable 
to very small topologies. It is hence totally useless for today’s integrated circuits with their 
millions of circuit nodes. Furthermore, this approach has the disadvantage that the “forest 
gets lost between the trees”. The circuit behavior at a given circuit node is only determined 
by a few dominant parameters. Bringing all possible effects to bear, only obscures the pic- 
ture and turns the optimization and design process a “trial-and-error” operation rather than 
an enlightened and focused search. 

To achieve the latter, it is important that the designer has a clear insight in the para- 
sitic wiring effects, their relative importance, and their models. This is best illustrated with 
the simple example, shown in Figure 4.1. Each wire in a bus network connects a transmit- 
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transmitters receivers 



schematics view physical view 

Figure 4.1 Schematic and physical views of wiring of bus-network. The latter shows only a limited area (as 
indicated by the shadings in the schematics). 

ter (or transmitters) to a set of receivers and is implemented as a link of wire segments of 
various lengths and geometries. Assume that all segments are implemented on a single 
interconnect layer, isolated from the silicon substrate and from each other by a layer of 
dielectric material. Be aware that the reality may be far more complex. 

A full-fledged circuit model, taking into account the parasitic capacitance, resis- 
tance, and the inductance of the interconnections, is shown in Figure 4.2a. Observe that 
these extra circuit elements are not located in a single physical point, but are distributed 
over the length of the wire. This is a necessity when the length of the wire becomes signif- 
icantly larger than its width. Notice also that some parasitics are inter-wire, hence creating 
coupling effects between the different bus-signals that were not present in the original 
schematics. 

Analyzing the behavior of this schematic, which only models a small part of the cir- 
cuit, is slow and cumbersome. Fortunately, substantial simplifications can often be made, 
some of which are enumerated below. 

• Inductive effects can be ignored if the resistance of the wire is substantial — this is 
for instance the case for long Aluminum wires with a small cross-section — or if the 
rise and fall times of the applied signals are slow. 

• When the wires are short, the cross-section of the wire is large, or the interconnect 
material used has a low resistivity, a capacitance-only model can be used (Figure 
4.2b). 

• Finally, when the separation between neighboring wires is large, or when the wires 
only run together for a short distance, inter-wire capacitance can be ignored, and all 
the parasitic capacitance can be modeled as capacitance to ground. 

Obviously, the latter problems are the easiest to model, analyze, and optimize. The 
experienced designer knows to differentiate between dominant and secondary effects. The 
goal of this chapter is to present the reader the basic techniques to estimate the values of 
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Figure 4.2 Wire models for the circuit of Figure 4.1. Model (a) considers most of the wire parasitics (with 
the exception of inter-wire resistance and mutual inductance), while model (b) only considers capacitance. 



the various interconnect parameters, simple models to evaluate their impact, and a set of 
mles-of-thumb to decide when and where a particular model or effect should be consid- 
ered. 



4.3 Interconnect Parameters — Capacitance, Resistance, and Inductance 

4.3.1 Capacitance 

An accurate modeling of the wire capacitance(s) in a state-of-the-art integrated circuit is a 
non-trivial task and is even today the subject of advanced research. The task is compli- 
cated by the fact that the interconnect structure of contemporary integrated circuits is 
three-dimensional, as was clearly demonstrated in the process cross-section of Figure 2.8. 
The capacitance of such a wire is a function of its shape, its environment, its distance to 
the substrate, and the distance to surrounding wires. Rather than getting lost in complex 
equations and models, a designer typically will use an advanced extraction tool to get pre- 
cise values of the interconnect capacitances of a completed layout. Most semiconductor 
manufacturers also provide empirical data for the various capacitance contributions, as 
measured from a number of test dies. Yet, some simple first-order models come in handy 
to provide a basic understanding of the nature of interconnect capacitance and its parame- 
ters, and of how wire capacitance will evolve with future technologies. 

Consider first a simple rectangular wire placed above the semiconductor substrate, 
as shown in Figure 4.3. If the width of the wire is substantially larger than the thickness of 
the insulating material, it may be assumed that the electrical-field lines are orthogonal to 
the capacitor plates, and that its capacitance can be modeled by the parallel-plate capaci- 
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tor model (also called area capacitance). Under those circumstances, the total capacitance 
of the wire can be approximated as 1 



c in , = E fWL (4.1) 

l di 

where W and L are respectively the width and length of the wire, and t dj and £ di represent 
the thickness of the dielectric layer and its permittivity. Si0 2 is the dielectric material of 
choice in integrated circuits, although some materials with lower permittivity, and hence 
lower capacitance, are coming in use. Examples of the latter are organic polyimides and 
aerogels. £ is typically expressed as the product of two terms, or e = ^ r ^ 0 . £ 0 = 8.854 x 1 0 
12 F/m is the permittivity of free space, and £,. the relative permittivity of the insulating 
material. Table 4.1 presents the relative permittivity of several dielectrics used in inte- 
grated circuits. In summary, the important message from Eq. (4.1) is that the capacitance 
is proportional to the overlap between the conductors and inversely proportional to their 
separation. 



Table 4.1 Relative permittivity of some typical dielectric materials. 



Material 


e, 


Free space 


i 


Aerogels 


-1.5 


Polyimides (organic) 


3-4 


Silicon dioxide 


3.9 


Glass-epoxy (PC board) 


5 


Silicon Nitride (Si 3 N 4 ) 


7.5 


Alumina (package) 


9.5 


Silicon 


11.7 




Electrical-field lines 



uuuu* Dielectric 



Substrate 



Figure 4.3 Parallel-plate capacitance 
model of interconnect wire. 



1 To differentiate between distributed (per unit length) wire parameters versus total lumped values, we 
will use lowercase to denote the former and uppercase for the latter. 
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In actuality, this model is too simplistic. To minimize the resistance of the wires 
while scaling technology, it is desirable to keep the cross-section of the wire (WxH) as 
large as possible — as will become apparent in a later section. On the other hand, small 
values of W lead to denser wiring and less area overhead. As a result, we have over the 
years witnessed a steady reduction in the W/H- ratio, such that it has even dropped below 
unity in advanced processes. This is clearly visible on the process cross-section of FIG- 
URE. Under those circumstances, the parallel-plate model assumed above becomes inac- 
curate. The capacitance between the side-walls of the wires and the substrate, called the 
fringing capacitance, can no longer be ignored and contributes to the overall capacitance. 
This effect is illustrated in Figure 4.4a. Presenting an exact model for this difficult geome- 




(a) Fringing fields 




(b) Model of fringing-field capacitance. 



Figure 4.4 The fringing-field 
capacitance. The model decomposes 
the capacitance into two contributions: 
a parallel-plate capacitance, and a 
fringing capacitance, modeled by a 
cylindrical wire with a diameter equal to 
the thickness of the wire. 



try is hard. So, as good engineering practice dictates, we will use a simplified model that 
approximates the capacitance as the sum of two components (Figure 4.4b): a parallel-plate 
capacitance determined by the orthogonal field between a wire of width w and the ground 
plane, in parallel with the fringing capacitance modeled by a cylindrical wire with a 
dimension equal to the interconnect thickness H. The resulting approximation is simple 
and works fairly well in practice. 

wire pp fringe t di log (t di /H) ’ 

with w = W - HI 2 a good approximation for the width of the parallel-plate capacitor. 
Numerous more accurate models (e.g. [Vdmeijs84]) have been developed over time, but 
these tend to be substantially more complex, and defeat our goal of developing a concep- 
tual understanding. 

To illustrate the importance of the fringing-field component. Figure 4.5 plots the 
value of the wiring capacitance as a function of (W/H). For larger values of ( W/H) the total 
capacitance approaches the parallel-plate model. For ( W/H) smaller than 1.5, the fringing 








chapter4.fm Page 139 Friday, January 18,2002 9:00 AM 








Section 4.3 Interconnect Parameters — Capacitance, Resistance, and Inductance 



139 









component actually becomes the dominant component. The fringing capacitance can 
increase the overall capacitance by a factor of more than 10 for small line widths. It is 
interesting to observe that the total capacitance levels off to a constant value of approxi- 
mately 1 pF/cm for line widths smaller than the insulator thickness. In other words, the 
capacitance is no longer a function of the width. 




W/T di 



Figure 4.5 Capacitance of interconnect wire as a function of (W/tJ, including fringing-field effects 
(from [Schaper83]). Two values of H/t dj are considered. Silicon-dioxide with e r = 3.9 is used as 
dielectric. 

So far, we have restricted our analysis to the case of a single rectangular conductor 
placed over a ground plane. This structure, called a microstripline, used to be a good 
model for semiconductor interconnections when the number of interconnect layers was 
restricted to 1 or 2. Today’s processes offer many more layers of interconnect, which are 
packed quite densely in addition. In this scenario, the assumption that a wire is completely 
isolated from its surrounding structures and is only capacitively coupled to ground, 
becomes untenable. This is illustrated in Figure 4.6, where the capacitance components of 
a wire embedded in an interconnect hierarchy are identified. Each wire is not only coupled 
to the grounded substrate, but also to the neighboring wires on the same layer and on adja- 
cent layers. To a first order, this does not change the total capacitance connected to a given 
wire. The main difference is that not all its capacitive components do terminate at the 
grounded substrate, but that a large number of them connect to other wires, which have 
dynamically varying voltage levels. We will later see that thes e floating capacitors form 
not only a source of noise (crosstalk), but also can have a negative impact on the perfor- 
mance of the circuit. 

In summary, inter-wire capacitances become a dominant factor in multi-layer inter- 
connect structures. This effect is more outspoken for wires in the higher interconnect lay- 
ers, as these wires are farther away from the substrate. The increasing contribution of the 
inter-wire capacitance to the total capacitance with decreasing feature sizes is best illus- 
trated by Figure 4.7. In this graph, which plots the capacitive components of a set of paral- 
lel wires routed above a ground plane, it is assumed that dielectric and wire thickness are 
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Figure 4.6 Capacitive coupling 
between wires in interconnect 
hierarchy. 



held constant while scaling all other dimensions. When W becomes smaller than 1.75 H, 
the inter-wire capacitance starts to dominate. 





Figure 4.7 Interconnect capacitance as 
a function of design rules. It consists of a 
capacitance to ground and an inter-wire 
capacitance (from [Schaper83]). 



Interconnect Capacitance Design Data 



A set of typical interconnect capacitances for a standard 0.25 pm CMOS process are given in 
Table 4.2. The process supports 1 layer of polysilicon and 5 layers of Aluminum. Metal layers 
1 to 4 have the same thickness and use a similar dielectric, while the wires at metal layer 5 are 
almost twice as thick and are embedded in a dielectric with a higher permittivity. When placing 
the wires over the thick field oxide that is used to isolate different transistors, use the “Field" 
column in the table, while wires routed over the active area see a higher capacitance as seen in 
the “Active” column. Be aware that the presented values are only indicative. To obtain more 
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accurate results for actual structures, complex 3-dinrensional models should be used that take 
the environment of the wire into account. 

Table 4.2 Wire area and fringe capacitance values for typical 0.25 pm CMOS process. The 
table rows represent the top plate of the capacitor, the columns the bottom plate. The 
area capacitances are expressed in aF/pm 2 , while the fringe capacitances (given in the 
shaded rows) are in aF/pm. 





Field 


Active 


Poly 


All 


A12 


A13 


A14 


Poly 


88 
















54 














All 


30 


41 


57 












40 


47 


54 










A12 


13 


15 


17 


36 










25 


27 


29 


45 








A13 


8.9 


9.4 


10 


15 


41 








18 


19 


20 


27 


49 






A14 


6.5 


6.8 


7 


8.9 


15 


35 






14 


15 


15 


18 


27 


45 




A15 


5.2 


5.4 


5.4 


6.6 


9.1 


14 


38 




12 


12 


12 


14 


19 


27 


52 



Table 4.3 tabulates indicative values for the capacitances between parallel wires placed 
on the same layer with a minimum spacing (as dictated by the design rules). Observe that these 
numbers include both the parallel plate and fringing components. Once again, the capacitances 
are a strong function of the topology. For instance, a ground plane placed on a neighboring 
layer terminates a large fraction of the fringing field, and effectively reduces the inter-wire 
capacitance. The polysilicon wires experience a reduced inter-wire capacitance due to the 
smaller thickness of the wires. On the other hand, the thick A15 wires display the highest inter- 
wire capacitance. It is therefore advisable to either separate wires at this level by an amount 
that is larger than the minimum allowed, or to use it for global signals that are not that sensitive 
to interference. The supply rails are an example of the latter. 

Table 4.3 inter-wire capacitance per unit wire length for different interconnect layers of typical 0.25 
pm CMOS process. The capacitances are expressed in aF/pm, and are for minimally-spaced wires. 



Layer 


Poly 


All 


A12 


A13 


A14 


A15 


Capacitance 


40 


95 


85 


85 


85 


115 











chapter4.fm Page 142 Friday, January 18,2002 9:00 AM 








142 



THE WIRE Chapter 4 




Example 4.1 Capacitance of Metal Wire 

Some global signals, such as clocks, are distributed all over the chip. The length of those 
wires can be substantial. For die sizes between 1 and 2 cm, wires can reach a length of 10 cm 
and have associated wire capacitances of substantial value. Consider an aluminum wire of 10 
cm long and 1 pm wide, routed on the first Aluminum layer. We can compute the value of the 
total capacitance using the data presented in Table 4.2. 

Area (parallel-plate) capacitance: (0. 1 X 10 6 pm 2 ) X 30 aF/pnr= 3 pF 

Fringing capacitance: 2 X (0.1 X 10 6 pm) X 40 aF/pm = 8 pF 

Total capacitance: 1 1 pF 

Notice the factor 2 in the computation of the fringing capacitance, which takes the two sides 
of the wire into account. 

Suppose now that a second wire is routed alongside the first one, separated by only the 
minimum allowed distance. From Table 4.3, we can determine that this wire will couple to 
the first with a capacitance equal to 

C j nter =(0.1 X 10 6 pm) x 95 aF/pm = 9.5 pF 

which is almost as large as the total capacitance to ground! 

A similar exercise shows that moving the wire to A14 would reduce the capacitance to 
ground to 3.45 pF (0.65 pF area and 2.8 pF fringe), while the inter-wire capacitance would 
remain approximately the same at 8.5 pF. 



4.3.2 Resistance 

The resistance of a wire is proportional to its length L and inversely proportional to its 
cross-section A. The resistance of a rectangular conductor in the style of Figure 4.3 can be 
expressed as 




R _ PL _ 

A HW 



(4.3) 



where the constant p is the resistivity of the material (in f2-m). The resistivities of some 
commonly-used conductive materials are tabulated in Table 4.4. Aluminum is the inter- 
connect material most often used in integrated circuits because of its low cost and its com- 
patibility with the standard integrated-circuit fabrication process. Unfortunately, it has a 
large resistivity compared to materials such as Copper. With ever-increasing performance 
targets, this is rapidly becoming a liability and top-of-the-line processes are now increas- 
ingly using Copper as the conductor of choice. 



Table 4.4 Resistivity of commonly-used conductors (at 20 C). 



Material 


p (fl-ni) 


Silver (Ag) 


1.6 x 10 -8 
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Table 4.4 Resistivity of commonly-used conductors (at 20 C). 



Material 


p (£2-m) 


Copper (Cu) 


1.7 x 10" 8 


Gold (Au) 


2.2 x 10“ 8 


Aluminum (Al) 


2.7 x lO" 8 


Tungsten (W) 


5.5 x 1(T 8 



Since H is a constant for a given technology, Eq. (4.3) can be rewritten as follows, 

R = R a - (4.4) 

W 

with 

*□=£ (4-5) 

the sheet resistance of the material, having units of fi/J (pronounced as Ohm-per-square). 
This expresses that the resistance of a square conductor is independent of its absolute size, 
as is apparent from Eq. (4.4). To obtain the resistance of a wire, simply multiply the sheet 
resistance by its ratio (LAV). 



Interconnect Resistance Design Data 



Typical values of the sheet resistance of various interconnect materials are given in Table 4.5. 
Table 4.5 Sheet resistance values for a typical 0.25 pm CMOS process. 



Material 


Sheet Resistance (£2/0) 


n- or p-well diffusion 


1000- 1500 


n + , p + diffusion 


50-150 


n + , p + diffusion with silicide 


3-5 


n + , p + polysilicon 


150-200 


n + , p + polysilicon with silicide 


4-5 


Aluminum 


0.05 - 0. 1 



From this table, we conclude that Aluminum is the preferred material for the wiring of 
long interconnections. Polysilicon should only be used for local interconnect. Although the 
sheet resistance of the diffusion layer (n + , p + ) is comparable to that of polysilicon, the use of dif- 
fusion wires should be avoided due to its large capacitance and the associated RC delay. 
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Advanced processes also offer silicided polysilicon and diffusion layers. A silicide is 
a compound material formed using silicon and a refractory metal. This creates a highly 
conductive material that can withstand high-temperature process steps without melting. 
Examples of silicides are WSi 2 TiSi 2 , PtSi 2 , and TaSi. WSi 2 , for instance, has a resistivity 
p of 130 p£2-cm, which is approximately eight times lower than polysilicon. The silicides 
are most often used in a configuration called a polycide, which is a simple layered combi- 
nation of polysilicon and a silicide. A typical polycide consists of a lower level of polysil- 
icon with an upper coating of silicide and combines the best properties of both 
materials — good adherence and coverage (from the poly) and high conductance (from the 
silicide). A MOSFET fabricated with a polycide gate is shown in Figure 4.8. The advan- 
tage of the silicided gate is a reduced gate resistance. Similarly, silicided source and drain 
regions reduce the source and drain resistance of the device. 

Silicide 

Polysilicon 

Si0 2 

Figure 4.8 A polycide-gate 
MOSFET. 

Transitions between routing layers add extra resistance to a wire, called the contact 
resistance. The preferred routing strategy is thus to keep signal wires on a single layer 
whenever possible and to avoid excess contacts or via’s. It is possible to reduce the contact 
resistance by making the contact holes larger. Unfortunately, current tends to concentrate 
around the perimeter in a larger contact hole. This effect, called current crowding , puts a 
practical upper limit on the size of the contact. The following contact resistances (for min- 
imum-size contacts) are typical for a 0.25 pm process: 5-20 £2 for metal or polysilicon to 
n + , p + . and metal to polysilicon; 1-5 £2 for via’s (metal-to-metal contacts). 






Example 4.2 Resistance of a Metal Wire 

Consider again the aluminum wire of Example 4.2, which is 10 cm long and 1 pm wide, and 
is routed on the first Aluminum layer. Assuming a sheet resistance for All of 0.075 £2/Q, we 
can compute the total resistance of the wire 

R wire = 0.075 £2/Q x (0.1 x 10 6 pm) / (1 pm)= 7.5 k£2 

Implementing the wire in poly silicon with a sheet resistance of 175 Q./U raises the overall 
resistance to 17.5 M£2, which is clearly unacceptable. Silicided polysilicon with a sheet resis- 
tance of 4 £2/Q offers a better alternative, but still translates into a wire with a 400 k£2 resis- 
tance. 



So far, we have considered the resistance of a semiconductor wire to be linear and 
constant. This is definitely the case for most semiconductor circuits. At very high frequen- 
cies however, an additional phenomenon — called the skin effect — comes into play such 
that the resistance becomes frequency-dependent. High-frequency currents tend to flow 
primarily on the surface of a conductor with the current density falling off exponentially 
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with depth into the conductor. The skin depth 8 is defined as the depth where the current 
falls off to a value of <? 1 of its nominal value, and is given by 



8 = 




(4.6) 



with/the frequency of the signal and |i the permeability of the surrounding dielectric (typ- 
ically equal to the permeability of free space, or (4 = 4% x 1 0 7 H/m). For Aluminum at 1 
GHz, the skin depth is equal to 2.6 (dm. The obvious question is now if this is something 
we should be concerned about when designing state-of-the-art digital circuits? 

The effect can be approximated by assuming that the current flows uniformly in an 
outer shell of the conductor with thickness 8, as is illustrated in Figure 4.9 for a rectangu- 
lar wire. Assuming that the overall cross-section of the wire is now limited to approxi- 







Figure 4.9 The skin-effect reduces the flow of the current 
to the surface of the wire. 



mately 2( W+II)6, we obtain the following expression for the resistance (per unit length) at 
high frequencies (f>f s ): 



r{f) 



Vrc/Up 

2 (H+W) 



(4.7) 



The increased resistance at higher frequencies may cause an extra attenuation — and 
hence distortion — of the signal being transmitted over the wire. To determine the on-set 
of the skin-effect, we can find the frequency f s where the skin depth is equal to half the 
largest dimension (W or H) of the conductor. Below f s the whole wire is conducting cur- 
rent, and the resistance is equal to (constant) low-frequency resistance of the wire. From 
Eq. (4.6), we find the value of/,: 



fs = 



ifi _ 

7t}4(max(W, H))~ 



(4.8) 



Example 4.3 Skin-effect and Aluminum wires 

We determine the impact of the skin-effect on contemporary integrated circuits by analyz- 
ing an Aluminum wire with a resistivity of 2.7x10 8 Q-m, embedded in a SiO, dielectric 
with a permeability of 4n X 10 7 H/m. From Eq. (4.8), we find that the largest dimension of 
wire should be at least 5.2 (dm for the effect to be noticeable at 1 GHz. This is confirmed 
by the more accurate simulation results of Figure 4.10, which plots the increase in resis- 
tance due to skin effects for different width Aluminum conductors. A 30% increase in 
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resistance can be observed at 1 GHz for a 20 (dm wire, while the increase for a 1 (dm wire 
is less than 1%. 



Skin effect for different width conductors 




Frequency (Hz) 



Figure 4.10 Skin-effect 
induced increase in resistance 
as a function of frequency and 
wire width. All simulations were 
performed for a wire thickness 
of 0.7 pm [Sylvester97], 



In summary, the skin-effect is only an issue for wider wires. Since clocks tend to 
carry the highest-frequency signals on a chip and also are fairly wide to limit resistance, 
the skin effect is likely to have its first impact on these lines. This is a real concern for 
GHz-range design, as clocks determine the overall performance of the chip (cycle time, 
instructions per second, etc.). Another major design concern is that the adoption of better 
conductors such as Copper may move the on-set of skin-effects to lower frequencies. 

4.3.3 Inductance 

Integrated-circuit designers tend to dismiss inductance as something they heard about in 
their physics classes, but that has no impact on their field. This was definitely the case in 
the first decades of integrated digital circuit design. Yet with the adoption of low-resistive 
interconnect materials and the increase of switching frequencies to the super GHz range, 
inductance starts to play a role even on a chip. Consequences of on-chip inductance 
include ringing and overshoot effects, reflections of signals due to impedance mismatch, 
inductive coupling between lines, and switching noise due to Ldi/dt voltage drops. 

The inductance of a section of a circuit can always be evaluated with the aid of its 
definition, which states that a changing current passing through an inductor generates a 
voltage drop AV 



A- 



AV = 









(4.9) 
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It is possible to compute the inductance a wire directly from its geometry and its 
environment. A simpler approach relies on the fact that the capacitance c and the induc- 
tance / (per unit length) of a wire are related by the following expression 



cl = £(4 



(4.10) 



with £ and |i respectively the permittivity and permeability of the surrounding dielectric. 
The caveat is that for this expression to be valid the conductor must be completely sur- 
rounded by a uniform dielectric medium. This is most often not the case. Yet even when 
the wire is embedded in different dielectric materials, its is possible to adopt “average” 
dielectric constants such that Eq. (4.10) still can be used to get an approximative value of 
the inductance. 

Some other interesting relations, obtained from Maxwell’s laws, can be pointed out. 
The constant product of permeability and permittivity also defines the speed v at which an 
electromagnetic wave can propagate through the medium 






v = 



1 

JTc 



l 

Tip 



Cq 



(4.11) 



c 0 equals the speed of light (30 cm/nsec) in a vacuum. The propagation speeds for a num- 
ber of materials used in the fabrication of electronic circuits are tabulated in Table 4.6. 
The propagation speed for Si0 2 is two times slower than in a vacuum. 

Table 4.6 Dielectric constants and wave-propagation speeds for various materials used in electronic 
circuits. The relative permeability p r of most dielectrics is approximately equal to 1. 



Dielectric 




Propagation speed 
(cm/nsec) 


Vacuum 


1 


30 


SiO, 


3.9 


15 


PC board (epoxy glass) 


5.0 


13 


Alumina (ceramic package) 


9.5 


10 





Example 4.4 Inductance of a Semiconductor Wire 

Consider an All wire implemented in the 0.25 micron CMOS technology and routed on top 
of the field oxide. From Table 4.2, we can derive the capacitance of the wire per unit length: 

c = (Wx30 + 2x40) aF/pm 

From Eq. (4.10), we can derive the inductance per unit length of the wire, assuming 
SiO, as the dielectric and assuming a uniform dielectric (make sure to use the correct units!) 

t = (3.9 x 8.854 x 10“ 12 ) x (4 7t 10“ 7 ) / C 

For wire widths of 0.4 pm, 1pm and 10pm. this leads to the following numbers: 
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W = 0.4 pm: c = 92 aF/pm; / = 0.47 pH/pm 
W = 1 pm: c = 1 10 aF/pm; / = 0.39 pH/pm 
W = 10 pm: c = 380 aF/pm; / = 0.1 1 pH/pm 

Assuming a sheet resistance of 0.075 11/0, we can also determine the resistance of the 
wire, 

r = 0.075/ W O/pin 

It is interesting to observe that the inductive part of the wire impedance becomes equal 
in value to the resistive component at a frequency of 27.5 GHz (for a 1 pm wide wire), as can 
be obtained from solving the following expression: 

to/ = 271// = r 

For extra wide wires, this frequency reduces to approximately 1 1 GHz. For wires with 
a smaller capacitance and resistance (such as the thicker wires located at the upper intercon- 
nect layers), this frequency can become as low as 500 MHz, especially when better intercon- 
nect materials such as Copper are being used. Yet, these numbers indicate that inductance 
only becomes an issue in integrated circuits for frequencies that are well above 1 GHz. 



4.4 Electrical Wire Models 

In previous sections, we have introduced the electrical properties of the interconnect wire 
— capacitance, resistance, and inductance — and presented some simple relations and 
techniques to derive their values from the interconnect geometries and topologies. These 
parasitic elements have an impact on the electrical behavior of the circuit and influence its 
delay, power dissipation, and reliability. To study these effects requires the introduction of 
electrical models that estimate and approximate the real behavior of the wire as a function 
of its parameters. These models vary from very simple to very complex depending upon 
the effects that are being studied and the required accuracy. In this section, we first derive 
models for manual analysis, while how to cope with interconnect wires in the SPICE cir- 
cuit simulator is the topic follows next. 

4.4.1 The Ideal Wire 

In schematics, wires occur as simple lines with no attached parameters or parasitics. These 
wires have no impact on the electrical behavior of the circuit. A voltage change at one end 
of the wire propagates immediately to its other ends, even if those are some distance away. 
Hence, it may be assumed that the same voltage is present at every segment of the wire at 
the every point in time, and that the whole wire is an equipotential region. While this 
ideal-wire model is simplistic, it has its value, especially in the early phases of the design 
process when the designer wants to concentrate on the properties and the behavior of the 
transistors that are being connected. Also, when studying small circuit components such 
as gates, the wires tend to be very short and their parasitics ignorable. Taking these into 
account would just make the analysis unnecessarily complex. More often though, wire 
parasitics play a role and more complex models should be considered. 
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4.4.2 The Lumped Model 

The circuit parasitics of a wire are distributed along its length and are not lumped into a 
single position. Yet, when only a single parasitic component is dominant, when the inter- 
action between the components is small, or when looking at only one aspect of the circuit 
behavior, it is often useful to lump the different fractions into a single circuit element. The 
advantage of this approach is that the effects of the parasitic then can be described by an 
ordinary differential equation. As we will see later, the description of a distributed element 
requires partial differential equations. 

As long as the resistive component of the wire is small and the switching frequen- 
cies are in the low to medium range, it is meaningful to consider only the capacitive com- 
ponent of the wire, and to lump the distributed capacitance into a single capacitor as 
shown in Figure 4.11. Observe that in this model the wire still represents an equipotential 
region, and that the wire itself does not introduce any delay. The only impact on perfor- 
mance is introduced by the loading effect of the capacitor on the driving gate. This capac- 
itive lumped model is simple, yet effective, and is the model of choice for the analysis of 
most interconnect wires in digital integrated circuits. 




Figure 4.11 Distributed versus lumped capacitance model of wire. C lumpsd = Lxc wjre , with L the length of the 
wire and c mre the capacitance per unit length. The driver is modeled as a voltage source and a source 
resistance R driver 





Example 4.5 Lumped capacitance model of wire 

For the circuit of Figure 4.1 1, assume that a driver with a source resistance of 10 k£2 is used to 
drive a 10 cm long, 1 pm wide All wire. In Example 4.1, we have found that the total lumped capac- 
itance for this wire equals 1 1 pF. 

The operation of this simple RC network is described by the following ordinary differential 
equation (similar to the expression derived in Example 1.6): 



r 

lumped - 



V -V 

out it 



d t 



R 



driver 



= o 



When applying a step input (with V in going from 0 to V), the transient response of this circuit is 
known to be an exponential function, and is given by the following expression (where x = 
RdnverXC lumped* the time constant of the network): 

V 0 Jt) = ( l-e-' rt )V 

The time to reach the 50% point is easily computed as t = ln(2)x = 0.69x. Similarly, it takes t 
= ln(9)x = 2.2 t to get to the 90% point. Plugging in the numbers for this specific example yields 

* 50 % = 0.69 X 10 Kfl X 1 1 pF = 76 nsec 
t 9 o% =2.2 x 10 Kf2 x 1 1 pF = 242 nsec 
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These numbers are not even acceptable for the lowest performance digital circuits. Techniques to 
deal with this bottleneck, such as reducing the source resistance of the driver, will be introduced in 
Chapter ZZZ. 



While the lumped capacitor model is the most popular, sometimes it is also useful to 
present lumped models of a wire with respect to either resistance and inductance. This is 
often the case when studying the supply distribution network. Both the resistance and 
inductance of the supply wires can be interpreted as parasitic noise sources that introduce 
voltage drops and bounces on the supply rails. 

4.4.3 The Lumped RC model 

On-chip metal wires of over a few mm length have a significant resistance. The equipoten- 
tial assumption, presented in the lumped-capacitor model, is no longer adequate, and a 
resistive-capacitive model has to be adopted. 

A first approach lumps the total wire resistance of each wire segment into one single 
R and similarly combines the global capacitance into a single capacitor C. This simple 
model, called the lumped RC model is pessimistic and inaccurate for long interconnect 
wires, which are more adequately represented by a distributed rc-model. Yet, before ana- 
lyzing the distributed model, its is worthwhile to spend some time on the analysis and the 
modeling of lumped RC networks for the following reasons: 

• The distributed rc-model is complex and no closed form solutions exist. The behav- 
ior of the distributed rc-line can be adequately modeled by a simple RC network. 

• A common practice in the study of the transient behavior of complex transistor-wire 
networks is to reduce the circuit to an RC network. Having a means to analyze such 
a network effectively and to predict its first-order response would add a great asset 
to the designers tool box. 

In Example 4.5, we analyzed a single resistor-single capacitor network. The behavior of 
such a network is fully described by a single differential equation, and its transient wave- 
form is a modeled by an exponential with a single time-constant (or network pole). Unfor- 
tunately, deriving the correct waveforms for a network with a larger number of capacitors 
and resistors rapidly becomes hopelessly complex: describing its behavior requires a set of 
ordinary differential equations, and the network now contains many time-constants (or 
poles and zeros). Short of running a full-fledged SPICE simulation, delay calculation 
methods such as the Elmore delay formula come to the rescue [Elmore48], 

Consider the resistor-capacitor network of Figure 4.12. This circuit is called an RC- 
tree and has the following properties: 

• the network has a single input node (called s in Figure 4.12) 

• all the capacitors are between a node and the ground 

• the network does not contain any resistive loops (which makes it a tree) 

An interesting result of this particular circuit topology is that there exists a unique resistive 
path between the source node s and any node i of the network. The total resistance along 
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this path is called the path resistance R lr For example, the path resistance between the 
source node ,v and node 4 in the example of Figure 4.12 equals 

R\ \ = R\ + R 3 + R 4 

The definition of the path resistance can be extended to address the shared path 
resistance R jk , which represents the resistance shared among the paths from the root node ,v 
to nodes k and i: 



R lk = ^^Rj => (Rj e [path(s -» i) npath(s -» £)]) (4.12) 

For the circuit of Figure 4.12, R i4 = R x + R 3 while R n = R ,. 

Assume now that each of the N nodes of the network is initially discharged to GND, 
and that a step input is applied at node s at time t = 0. The Elmore delay at node i is then 
given by the following expression: 

N 

= £ C k R ik (4.13) 

k= 1 

The Elmore delay is equivalent to the first-order time constant of the network (or the first 
moment of the impulse response). The designer should be aware that this time-constant 
represents a simple approximation of the actual delay between source node and node i. Yet 
in most cases this approximation has proven to be quite reasonable and acceptable. It 
offers the designer a powerful mechanism for providing a quick estimate of the delay of a 
complex network. 



Example 4.6 RC delay of a tree-structured network 

Using Eq. (4.13), we can compute the Elmore delay for node i in the network of Figure 4.12. 
i Di = R t C | + R\Ci + ( R i + Rs)C 4 + (R\ + R^)C 4 + (R\ + R^ + R;)C, 




As a special case of the RC tree network, let us consider the simple, non-branched 
RC chain (or ladder) shown in Figure 4.13. This network is worth analyzing because it is a 
structure that is often encountered in digital circuits, and also because it represents an 
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approximative model of a resistive-capacitive wire. The Elmore delay of this chain net- 
work can be derived with the aid of Eq. (4.13): 



V !n Rf 1 fi 2 2 

^MArl — VWf 




Rj. i / - 1 R, i 

W^AVi 



->N N 



A/vVr 




Figure 4.13 RC chain. 



A i N 

x dn = = E c '*" (414) 

i=l j = 1 i = 1 



or the shared-path resistance is replaced by simply the path resistance. As an example, 
consider node 2 in the RC chain of Figure 4.13. Its time-constant consists of two compo- 
nents contributed by nodes 1 and 2. The component of node 1 consists of C l R l with R { the 
total resistance between the node and the source, while the contribution of node 2 equals 
C 2 (R l + R 2 ). The equivalent time constant at node 2 equals C ] R ] + C-,(R ] + R 2 ). X; of node 
i can be derived in a similar way. 

z Di = C l R l + C 2 (R\ + R 2 ) + + C/R, +R 2 + ... + R;) 



Example 4.7 Time-Constant of Resistive-Capacitive Wire 

The model presented in Figure 4.13 can be used as an approximation of a resistive-capacitive wire. 
The wire with a total length of L is partitioned into N identical segments, each with a length of UN. 
The resistance and capacitance of each segment are hence given by rLIN and cL/N, respectively. 
Using the Elmore formula, we can compute the dominant time-constant of the wire: 

T /w = f~T( rc + 2rc + ... +Nrc) = (rcL 2 ) ^-t ..l) = RC *L±A (4.15) 

\NJ 2Nj L 

with R (= rL) and C (= cL) the total lumped resistance and capacitance of the wire. For very large 
values of N, this model asymptotically approaches the distributed rc line. Eq. (4.15) then simplifies 
to the following expression: 





_ RC _ rcL 2 
X dn~ 2 ~ 2 



(4.16) 



Eq. (4.16) leads to two important conclusions: 

• The delay of a wire is a quadratic function of its length! This means that doubling 
the length of the wire quadruples its delay. 

• The delay of the distributed rc- line is one half of the delay that would have been 
predicted by the lumped RC model. The latter combines the total resistance and 
capacitance into single elements, and has a time-constant equal to RC (as is also 
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obtained by setting N = 1 in Eq. (4.15)). This confirms the observation made earlier 
that the lumped model presents a pessimistic view on the delay of resistive wire. 



WARNING: Be aware that an WC-chain is characterized by a number of time-constants. 
The Elmore expression determines the value of only the dominant one, and presents thus a 
first-order approximation. 



The Elmore delay formula has proven to be extremely useful. Besides making it 
possible to analyze wires, the formula can also be used to approximate the propagation 
delay of complex transistor networks. In the switch model, transistors are replaced by their 
equivalent, linearized on-resistance. The evaluation of the propagation delay is then 
reduced to the analysis of the resulting RC network. More precise minimum and maxi- 
mum bounds on the voltage waveforms in an RC tree have further been established 
[Rubinstein83]. These bounds have formed the base for most computer-aided timing ana- 
lyzers at the switch and functional level [Horowitz83]. An interesting result [Lin84] is that 
the exponential voltage waveform with the Elmore delay as time constant is always situ- 
ated between these min and max bounds, which demonstrates the validity of the Elmore 
approximation. 



4.4.4 The Distributed rc Line 



In the previous paragraphs, we have shown that the lumped RC model is a pessimistic 
model for a resistive-capacitive wire, and that a distributed rc model (Figure 4.14a) is 
more appropriate. As before, L represents the total length of the wire, while r and c stand 
for the resistance and capacitance per unit length. A schematic representation of the dis- 
tributed rc line is given in Figure 4.14b. 



l/,„ r&L 



rAL 



rAL v 



I'm rAL V, rAL V M 

*-WV~j — ^/V~p Wv-j— A/W-j- Wv~p 

~j~ cAL j cAL j cA L j 



CAL ' 



x 

► 



(a) Distributed model 



(r,c,L) 



(b) Schematic symbol for distributed flCline 



Figure 4.14 Distributed RC line wire-model and its schematic symbol. 




voltage (V) 
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The voltage at node i of this network can be determined by solving the following set 
of partial differential equations: 



Mi = (V, + 1 -V,) + (V,-i-V,) 

dt rAL 



(4.17) 



The correct behavior of the distributed rc line is then obtained by reducing AL asymptoti- 
cally to 0. For AL — » 0, Eq. (4.17) becomes the well-known diffusion equation: 



dV aV 

rc— = 

dt dx 2 



(4.18) 



where V is the voltage at a particular point in the wire, and x is the distance between this 
point and the signal source. No closed-form solution exists for this equation, but approxi- 
mative expressions such as the formula presented in Eq. (4.19) can be derived 
[Bakoglu90]. These equations are difficult to use for ordinary circuit analysis. It is known 
however that the distributed rc line can be approximated by a lumped RC ladder network, 
which can be easily used in computer-aided analysis. Some of these models will be pre- 
sented in a later section, discussing SPICE wire models. 



y 0 «i( f ) = 2er M 




-2.5359 — 

= 1.0 - 1.366e RC + 



0.366e 



-9.4641 — 
RC 



t « RC 



t » RC 



(4.19) 



Figure 4.15 shows the response of a wire to a step input, plotting the waveforms at 
different points in the wire as a function of time. Observe how the step waveform “dif- 
fuses” from the start to the end of the wire, and the waveform rapidly degrades, resulting 
in a considerable delay for long wires. Driving these rc lines and minimizing the delay and 
signal degradation is one of the trickiest problems in modern digital integrated circuit 
design. It hence will receive considerable attention in later chapters. 




Figure 4.15 Simulated step response of resistive-capacitive wire as a function of time and place. 
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Some of the important reference points in the step response of the lumped and the 
distributed RC model of the wire are tabulated in Table 4.7. For instance, the propagation 
delay (defined at 50% of the final value) of the lumped network not surprisingly equals 
0.69 RC. The distributed network, on the other hand, has a delay of only 0.38 RC, with R 
and C the total resistance and capacitance of the wire. This confirms the result of Eq. 
(4.16). 



Table 4.7 Step response of lumped and distributed RC networks — points of Interest. 



Voltage range 


Lumped RC network 


Distributed RC network 


0 -> 50% (; t p ) 


0.69 RC 


0.38 RC 


0 -» 63% (X) 


RC 


0.5 RC 


10% -» 90% (f r ) 


2.2 RC 


0.9 RC 


0% -» 90% 


2.3 RC 


1.0 RC 



Example 4.8 RC delay of Aluminum Wire 

Let us consider again the 10 cm long, 1 pm wide All wire of Example 4.1. In Example 4.4, we 
derived the following values for r and c: 

c = 1 10 aF/pm; r = 0.075 £i/pm; 

Using the entry of Table 4.7, we derive the propagation delay of the wire: 

t p = 0.38 RC = 0.38 x (0.075 H/pm) X (110 aF/pm) x (10 s pm) 2 = 31.4 nsec 

We can also deduce the propagation delays of an identical wire implemented in polysilicon 
and A15. The values of the capacitances are obtained from Table 4.2, while the resistances are 
assumed to be respectively 150 fi/pm and 0.0375 fi/pm for Poly and A15: 

Poly: t p = 0.38 X (150 fl/pm) X (88 + 2 X 54 aF/pm) x (10 5 pm) 2 =112 psec! 

A15: t p = 0.38 X (0.0375 fi/pm) x (5.2 + 2x12 aF/pm) X ( 10 s pm) 2 = 4.2 nsec 

Obviously, the choice of the interconnect material and layer has a dramatic impact on the delay of 
the wire. 



An important question for a designer to answer when analyzing an interconnect network 
whether the effects of RC delays should be considered, or whether she can get away with a 
simpler lumped capacitive model. A simple rule of thumb proves to be very useful here. 



Design Rules of Thumb 






• rc delays should only be considered when t pRC » t pgate of the driving gate. 

This translates into Eq. (4.20), which determines the critical length L of the interconnect 
wire where RC delays become dominant. 
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L C ri, » 



L pgate 

0.38 rc 



(4.20) 



The actual value of L crjt depends upon the sizing of the driving gate and the chosen inter- 
connect material. 



• rc delays should only be considered when the rise (fall) time at the line input is 
smaller than RC, the rise (fall) time of the line. 

t ri se<RC (4.21) 

with R and C the total resistance and capacitance of the wire. When this condition is not 
met, the change in signal is slower than the propagation delay of the wire, and a lumped capac- 
itive model suffices. 



Example 4.9 RC versus Lumped C 

The presented rule can be illustrated with the aid of the simple circuit shown in Figure 
4.16. It is assumed here that the driving gate can be modeled as voltage source with a 
finite source resistance R s . The total propagation delay of the network can be approxi- 
mated by the following expression, obtained by applying the Elmore formula: 2 




Figure 4.16 rc-line of length L driven by source with 
resistance equal to R s . 



R 0 C„, + 



R W C W 



R s C w + 0.5r w c w L~ 



and 



t p = 0.69R S C W + 038R W C W 

with R w = rL and C w = cL. The delay introduced by the wire resistance becomes dominant 
when (R w C w )/2 > R S C W , or L> 2 RJr. Assume now a driver with a source resistance of 1 
k£2, driving an All wire of 1 pm wide (r = 0.075 O/ttm). This leads to a critical length of 
2.67 cm. 



2 Hint: replace the wire by the lumped RC network of Figure 4.13 and apply the Elmore equation on the 
resulting network. 
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4.4.5 The Transmission Line 

When the switching speeds of the circuits become sufficiently fast, and the quality of the 
interconnect material become high enough so that the resistance of the wire is kept within 
bounds, the inductance of the wire starts to dominate the delay behavior, and transmission 
line effects must be considered. This is more precisely the case when the rise and fall times 
of the signal become comparable to the time of flight of the signal waveform across the 
line as determined by the speed of light. With the advent of Copper interconnect and the 
high switching speeds enabled by the deep-submicron technologies, transmission line 
effects are soon to be considered in the fastest CMOS designs. 

In this section, we first analyze the transmission line model. Next, we apply it to the 
current semiconductor technology and determine when those effects should be actively 
considered in the design process. 

Transmission Line Model 

Similar to the resistance and capacitance of an interconnect line, the inductance is distrib- 
uted over the wire. A distributed rlc model of a wire, known as the transmission line 
model, becomes the most accurate approximation of the actual behavior. The transmission 
line has the prime property that a signal propagates over the interconnection medium as a 
wave. This is in contrast to the distributed rc model, where the signal diffuses from the 
source to the destination governed by the diffusion equation, Eq. (4.18). In the wave 
mode, a signal propagates by alternatively transferring energy from the electric to the 
magnetic fields, or equivalently from the capacitive to the inductive modes. 

Consider the point x along the transmission line of Figure 4.17 at time t. The follow- 
ing set of equations holds: 






Assuming that the leakage conductance g equals 0, which is true for most insulating mate- 
rials, and eliminating the current i yields the wave propagation equation, Eq. (4.23). 



2 

d v 

dff 2 



2 

dv , , d v 
dt dt 2 









(4.23) 
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where r, c, and I are the resistance, capacitance, and inductance per unit length, 
respectively. 

To understand the behavior of the transmission line, we will first assume that the 
resistance of the line is small. In this case, a simplified capacitive/inductive model, called 
the lossless transmission line , is appropriate. This model is applicable for wires at the 
printed-circuit board level. Due to the high conductivity of the Copper interconnect mate- 
rial used there, the resistance of the transmission line can be ignored. On the other hand, 
resistance plays an important role in integrated circuits, and a more complex model, called 
the lossy transmission line should be considered. The lossy model is only discussed briefly 
at the end. 



The Lossless Transmission Line 

For the lossless line, Eq. (4.23) simplifies to the ideal wave equation: 



2 

d v 
dx 2 



Ic 



2 

d v 



2 

1 d v 



dt 2 V 2 dt 2 



(4.24) 



A step input applied to a lossless transmission line propagates along the line with a 
speed v, given by Eq. (4.11) and repeated below. 



V 



1 1 
JJc Je\i 



c o 



(4.25) 



Even though the values of both / and c depend on the geometric shape of the wire, their 
product is a constant and is only a function of the surrounding media. The propagation 
delay per unit wire length (t p ) of a transmission line is the inverse of the speed: 



t p = JTc (4.26) 

Let us now analyze how a wave propagates along a lossless transmission line. Suppose 
that a voltage step V has been applied at the input and has propagated to point x of the line 
(Figure 4.18). All currents are equal to 0 at the right side of x, while the voltage over the 
line equals V at the left side. An additional capacitance cdx must be charged for the wave 
to propagate over an additional distance dx. This requires the following current: 



I = dQ = C—V = cvV = l-V (4.27) 

dr dr 4 1 

since the propagation speed of the signal dx/dt equals v. This means that the signal sees 
the remainder of the line as areal impedance, 

Z 0 = - = f- = ^ = — . (4.28) 

I V C c cv 

This impedance, called the characteristic impedance of the line, is a function of the 
dielectric medium and the geometry of the conducting wire and isolator (Eq. (4.28)), and 
is independent of the length of the wire and the frequency. That a line of arbitrary length 
has a constant, real impedance is a great feature as it simplifies the design of the driver cir- 
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Wire 




Substrate 



x 

► 

Direction of propagation 



Figure 4.18 Propagation of 
voltage step along a lossless 
transmission line. 



cuitry. Typical values of the characteristic impedance of wires in semiconductor circuits 
range from 10 to 200 £2. 



Example 4.10 Propagation Speeds of Signal Waveforms 

The information of Table 4.6 shows that it takes 1.5 nsec for a signal wave to propagate from 
source-to-destination on a 20 cm wire deposited on an epoxy printed-circuit board. If trans- 
mission line effects were an issue on silicon integrated circuits, it would take 0.67 nsec for the 
signal to reach the end of a 10 cm wire. 



WARNING: The characteristic impedance of a wire is a function of the overall intercon- 
nect topology. The electro-magnetic fields in complex interconnect structures tend to be 
irregular, and are strongly influenced by issues such as the current return path. Providing a 
general answer to the latter problem has so far proven to be illusive, and no closed-formed 
analytical solutions are typically available. Hence, accurate inductance and characteristic 
impedance extraction is still an active research topic. For some simplified structures, 
approximative expressions have been derived. For instance, the characteristic impedances 
of a triplate strip-line (a wire embedded in between two ground planes) and a semiconduc- 
tor micro strip-line (wire above a semiconductor substrate) are approximated by Eq. (4.29) 
and Eq. (4.30), respectively. 

Z 0 ( triplate) « 94£2 J|ln[|±^j (4.29) 

and 




Z 0 ( microstrip) = 60 £2 



Mr 



-In 



At 



0.475e„ + 0.67 V.0.536W + 0.61 H 



(4.30) 



Termination 

The behavior of the transmission line is strongly influenced by the termination of the line. 
The termination determines how much of the wave is reflected upon arrival at the wire 
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end. This is expressed by the reflection coefficient p that determines the relationship 
between the voltages and currents of the incident and reflected waveforms. 



P 



Y_ refl _ \refl 

V ■ ~ I 

inc inc 



R-Z, 



o 



R + Z„ 



(4.31) 



where R is the value of the termination resistance. The total voltages and currents at the 
termination end are the sum of incident and reflected waveforms. 



V = V inc (l+p) 

i = W i-P) 



(4.32) 



Three interesting cases can be distinguished, as illustrated in Figure 4.19. In case 
(a) the terminating resistance is equal to the characteristic impedance of the line. The ter- 
mination appears as an infinite extension of the line, and no waveform is reflected. This 
is also demonstrated by the value of p, which equals 0. In case (b), the line termination is 

Incident wave 







(a) Matched termination 

( 




1/ A 



Reflected wave 



> 



1/ 



L 



► 



x 



(b) Open-circuit termination 



► 






L 



i o 



(c) Short-circuit termination |_ | 

Figure 4.19 Behavior of various transmission line terminations. 



x 



► 

X 




an open circuit (R = °°), and p = 1 . The total voltage waveform after reflection is twice 
the incident one as predicted by Eq. (4.32). Finally, in case (c) where the line termination 
is a short circuit, R = 0, and p = -1. The total voltage waveform after reflection equals 
zero. 

The transient behavior of a complete transmission line can now be examined. It is 
influenced by the characteristic impedance of the line, the series impedance of the source 
Z s , and the loading impedance Z L at the destination end, as shown in Figure 4.20. 

Consider first the case where the wire is open at the destination end, or Z L = and 
p L = 1. An incoming wave is completely reflected without phase reversal. Under the 
assumption that the source impedance is resistive, three possible scenarios are sketched in 
Figure 4.21: R s = 5 Z (> R s = Z 0 , and R s = 1/5 Z 0 . 

1. Large source resistance — R s = 5 Z 0 (Figure 4.21a) 
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Figure 4.20 Transmission line 
with terminating impedances. 





(a) R s - 5 Z 0 , R l - x 



(b) R s - Z 0 , R l - x 



(c) Rg- Z 0 /5, R l - x 



Only a small fraction of the incoming signal V in is injected into the transmission 
line. The amount injected is determined by the resistive divider formed by the source 
resistance and the characteristic impedance Z 0 . 

V**™ = (z 0 / (Z 0 + Rs)) V in = 1/6 x 5 V = 0.83 V (4.33) 

This signal reaches the end of the line after L/v sec, where L stands for the length of 
the wire and is fully reflected, which effectively doubles the amplitude of the wave 
( V dest~ 1-67 V). The time it takes for the wave to propagate from one end of the wire to 
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the other is called the time-of-flight, tjj jght = LN. Approximately the same happens when 
the wave reaches the source node again. The incident waveform is reflected with an ampli- 
tude determined by the source reflection coefficient, which equals 2/3 for this particular 
case. 



5Z () - Z 0 _ 2 
5 5Z 0 + Z 0 3 



(4.34) 



The voltage amplitude at source and destination nodes gradually reaches its final value of 
V in . The overall rise time is, however, many times LN. 

When multiple reflections are present, as in the above case, keeping track of waves 
on the line and total voltage levels rapidly becomes cumbersome. Therefore a graphical 
construction called the lattice diagram is often used to keep track of the data (Figure 
4.22). The diagram contains the values of the voltages at the source and destination ends, 
as well as the values of the incident and reflected wave forms. The line voltage at a termi- 
nation point equals the sum of the previous voltage, the incident, and reflected waves. 




2. Small source resistance — R S =Z ( J5 (Figure 4.21c) 

A large portion of the input is injected in the line. Its value is doubled at the destina- 
tion end, which causes a severe overshoot. At the source end, the phase of the signal is 
reversed (p v = -2/3). The signal bounces back and forth and exhibits severe ringing. It 
takes multiple LN before it settles. 

3. Matched source resistance — R S =Z 0 (Figure 4.21b) 

Half of the input signal is injected at the source. The reflection at the destination end 
doubles the signal, so that the final value is reached immediately. It is obvious that this is 
the most effective case. 

Note that the above analysis is an ideal one, as it is assumed that the input signal has 
a zero rise time. In real conditions the signals are substantially smoother, as demonstrated 
in the simulated response of Figure 4.23 (for R s = Z ( /5 and t r = tf Ught ). 




chapter4.fm Page 163 Friday, January 18,2002 9:00 AM 







Section 4.4 Electrical Wire Models 



163 




Figure 4.23 Simulated transient 
response of lossless transmission 
line for finite input rise times (fl s = 

ZJ5, t r — tfljgfrj). 



Problem 4.1 Transmission Line Response 

Derive the lattice diagram of the above transmission line for R s = ZJ5, R L = °°, and V step = 5 
V. Also try the reverse picture — assume that the series resistance of the source equals zero, 
and consider different load impedances. 




Example 4.11 Capacitive Termination 

Loads in MOS digital circuits tend to be of a capacitive nature. One might wonder how this 
influences the transmission line behavior and when the load capacitance should be taken into 
account. 

The characteristic impedance of the transmission line determines the current that can be 
supplied to charge capacitive load C L . From the load’s point of the view, the line behaves as a 
resistance with value Z„. The transient response at the capacitor node, therefore, displays a 
time constant Z g C L . This is illustrated in Figure 4.24, which shows the simulated transient 
response of a series-terminated transmission line with a characteristic impedance of 50 Q. 
loaded by a capacitance of 2 pF. The response shows how the output rises to its final value 
with a time-constant of 100 psec (= 50 11 x 2 pF) after a delay equal to the time-of-flight of 
the line. 

This asymptotic response causes some interesting artifacts. After 2 t )light , an unexpected 
voltage dip occurs at the source node that can be explained as follows. Upon reaching the des- 
tination node, the incident wave is reflected. This reflected wave also approaches its final 
value asymptotically. Since V desl equals 0 initially instead of the expected jump to 5 V, the 
reflection equals -2.5 V rather than the expected 2.5 V. This forces the transmission line tem- 
porarily to 0 V, as shown in the simulation. This effect gradually disappears as the output 
node converges to its final value. 

The propagation delay of the line equals the sum of the time-of-flight of the line (= 50 
psec) and the time it takes to charge the capacitance (= 0.69 Z 0 C L = 69 psec). This is exactly 
what the simulation yields. In general, we can say that the capacitive load should only be con- 
sidered in the analysis when its value is comparable to or larger than the total capacitance of 
the transmission line [Bakoglu90]. 
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Figure 4.24 Capacitively 
terminated transmission line: R s - 
50 12, R l = C L = 2 pF, Z 0 = 50 £2, 
W.f= 50 P sec - 




Lossy Transmission Line 

While board and module wires are thick and wide enough to be treated as lossless trans- 
mission lines, the same is not entirely true for on-chip interconnect where the resistance of 
the wire is an important factor. The lossy transmission-line model should be applied 
instead. Going into detail about the behavior of a lossy line would lead us to far astray. We 
therefore only discuss the effects of resistive loss on the transmission line behavior in a 
qualitative fashion. 

The response of a lossy RLC line to a unit step combines wave propagation with a 
diffusive component. This is demonstrated in Figure 4.25, which plots the response of the 
RLC transmission line as a function of distance from the source. The step input still propa- 
gates as a wave through the line. However, the amplitude of this traveling wave is attenu- 
ated along the line: 



Y_stejA ^_ ) _ e 2Z o 

W°) ” 



(4.35) 



The arrival of the wave is followed by a diffusive relaxation to the steady-state value 
at point x. The farther it is from the source, the more the response resembles the behavior 
of a distributed RC line. In fact, the resistive effect becomes dominant, and the line 
behaves as a distributed RC line when R (= rL , the total resistance of the line) » 2 Z 0 . 
When R = 5 Z 0 , only 8% of the original step reaches the end of the line. At that point, the 
line is more appropriately modeled as a distributed rc line. 

Be aware that the actual wires on chips, boards, or substrates behave in a far more 
complex way than predicted by the above analysis. For instance, branches on wires, often 
called transmission line taps, cause extra reflections and can affect both signal shape and 
delay. Since the analysis of these effects is very involved, the only meaningful approach is 
to use computer analysis and simulation techniques. For a more extensive discussion of 
these effects, we would like to refer the reader to [Bakoglu90] and [Dally98] . 
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Figure 4.25 Step response of lossy transmission line. 
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Design Rules of Thumb 




Once again, we have to ask ourselves the question when it is appropriate to consider 
transmission line effects. From the above discussion, we can derive two important constraints: 

• Transmission line effects should be considered when the rise or fall time of the 
input signal ( t„ t f ) is smaller than the time-of-flight of the transmission line ( tjj igh ,). 

This leads to the following rule of thumb, which determines when transmission line 
effects should be considered: 

t r (t f ) <2.5t flight = 2.5^ (4.36) 

For on-chip wires with a maximum length of 1 cm, one should only worry about trans- 
mission line effects when t r < 150 psec. At the board level, where wires can reach a length of 
up to 50 cm, we should account for the delay of the transmission line when t r < 8 nsec. This 
condition is easily achieved with state-of-the-art processes and packaging technologies. Ignor- 
ing the inductive component of the propagation delay can easily result in overly optimistic 
delay predictions. 

• Transmission line effects should only be considered when the total resistance of the 
wire is limited: 



R < 5 Z 0 (4.37) 

If this is not the case, the distributed RC model is more appropriate. 

Both constraints can be summarized in the following set of bounds on the wire length: 



±±<L< 5 - l 

2-5 7 fc 



(4.38) 



• The transmission line is considered lossless when the total resistance is substantially 
smaller than the characteristic impedance, or 
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(4.39) 




Example 4.12 When to Consider Transmission Line Effects 

Consider again our All wire. Using the data from Example 4.4 and Eq. (4.28), we can 
approximate the value of Z 0 for various wire widths: 

W = 0.1 pm: c = 92 aF/pm; Z 0 = 74 Q. 

W = 1.0 pm: c = 1 10 aF/pm; Z 0 = 60 22 
W = 10 pm: c = 380 aF/pm; Z () = 17 

For a wire with a width of 1pm, we can derive the maximum length of the wire for 
which we should consider transmission line effects using Eq. (4.37): 



L 



max 




5 x 6011 
0.07512/pm 



= 4000pm 



From Eq. (4.36), we find a corresponding maximum rise (or fall) time of the input signal 
equal to 

t nnax = 2.5 X (4000 pm)/(15 cm/nsec) = 67 psec 



This is hard to accomplish in current technologies. For these wires, a lumped capacitance 
model is more appropriate. Transmission line effects are more plausible in wider wires. For a 
10 pm wide wire, we find a maximum length of 11.3 mm, which corresponds to a maximum 
rise time of 188 psec. 

Assume now a Copper wire, implemented on level 5, with a characteristic impedance 
of 200 12 and a resistance of 0.025 12/pm. The resulting maximum wire length equals 40 mm. 
Rise times smaller than 670 psec will cause transmission line effects to occur. 

Be aware however that the values for Z 0 , derived in this example, are only approxima- 
tions. In actual designs, more complex expressions or empirical data should be used. 



Example 4.13 Simulation of Transmission Line Effects 
Show SPICE simulation 



4.5 SPICE Wire Models 

In previous sections, we have discussed the various interconnect parasitics, and introduced 
simple models for each of them. Yet, the full and precise impact of these effects can only 
be found through detailed simulation. In this section, we introduce the models that SPICE 
provides for the capacitive, resistive, and inductive parasitics. 






ill 
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4.5.1 Distributed rc Lines in SPICE 

Because of the importance of the distributed rc-line in today’s design, most circuit simula- 
tors have built-in distributed rc-models of high accuracy. For instance, the Berkeley 
SPICE3 simulator supports a uniform-distributed rc-line model (URC). This model 
approximates the rc-line as a network of lumped RC segments with internally generated 
nodes. Parameters include the length of the wire L and (optionally) the number of seg- 
ments used in the model. 



Example 4.14 SPICE3 URC Model 

A typical example of a SPICE3 instantiation of a distributed rc-line is shown below. N 1 and 
N2 represent the terminal nodes of the line, while N3 is the node the capacitances are con- 
nected to. RPERL and CPERL stand for the resistance and capacitance per meter. 

U1 Nl = l N2=2 N3=0 URCMOD L=50m N=6 
.MODEL URCMOD URC(RPERL=75K CPERL=100pF) 



If your simulator does not support a distributed rc-model, or if the computational 
complexity of these models slows down your simulation too much, you can construct a 
simple yet accurate model yourself by approximating the distributed rc by a lumped RC 
network with a limited number of elements. Figure 4.26 shows some of these approxima- 
tions ordered along increasing precision and complexity. The accuracy of the model is 
determined by the number of stages. For instance, the error of the Jt3 model is less than 
3%, which is generally sufficient. 

4.5.2 Transmission Line Models in SPICE 

SPICE supports a lossless transmission line model. The line characteristics are defined by 
the characteristic impedance Z 0 , while the length of the line can be defined in either of two 
forms. A first approach is to directly define the transmission delay TD , which is equiva- 
lent to the time-of-flight. Alternatively, a frequency F may be given together with NL, the 
dimensionless, normalized electrical length of the transmission line, which is measured 
with respect to the wavelength in the line at the frequency F. The following relation is 
valid. 



NL = FTD (4.40) 

No lossy transmission line model is currently provided. When necessary, loss can be 
added by breaking up a long transmission line into shorter sections and adding a small 
series resistance in each section to model the transmission line loss. Be careful when using 
this approximation. First of all, the accuracy is still limited. Secondly, the simulation 
speed might be severely effected, since SPICE chooses a time step that is less than or 
equal to half of the value of TD. For small transmission lines, this time step might be much 
smaller than what is needed for transistor analysis. 
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Figure 4.26 Simulation models for distributed RC line. 



4.6 Perspective: A Look into the Future 

Similar to the approach we followed for the MOS transistor, it is worthwhile to explore 
how the wire parameters will evolve with further scaling of the technology. As transistor 
dimensions are reduced, the interconnect dimensions must also be reduced to take full 
advantage of the scaling process. 

A straightforward approach is to scale all dimensions of the wire by the same factor 
S as the transistors ( ideal scaling). This might not be possible for at least one dimension of 
the wire, being the length. It can be surmised that the length of local interconnections — 
wires that connect closely grouped transistors — scales in the same way as these transis- 
tors. On the other hand, global interconnections, that provide the connectivity between 
large modules and the input-output circuitry, display a different scaling behavior. Exam- 
ples of such wires are clock signals, and data and instruction buses. Figure 4.27 contains a 
histogram showing the distribution of the wire lengths in an actual microprocessor design, 
containing approximately 90,000 gate) [Davis98]. While most of the wires tend to be only 
a couple of gate pitches long, a substantial number of them are much longer and can reach 
lengths up to 500 gate pitches. 

The average length of these long wires is proportional to the die size (or complexity) 
of the circuit. An interesting trend is that while transistor dimensions have continued to 
shrink over the last decades, the chip sizes have gradually increased. In fact, the size of the 
typical die (which is the square root of the die area) is increasing by 6% per year, doubling 













chapter4.fm Page 169 Friday, January 18,2002 9:00 AM 








Section 4.6 Perspective: A Look into the Future 



169 




Figure 4.27 Distribution of wire 
lengths in an advanced 
microprocessor as a function of 
the gate pitch. 




about every decade. Chips have scaled from 2 mm x 2 mm in the early 1960s to approxi- 
mately 2 cm x 2 cm in 2000. They are projected to reach 4 cm on the side by 2010! 

This argues that when studying the scaling behavior of the wire length, we have to 
differentiate between local and global wires. In our subsequent analysis, we will therefore 
consider three models: local wires (S L = S> 1), constant length wires (S L = 1), and global 
wires (S, = S c < 1). 

Assume now that all other wire dimensions of the interconnect structure (W, //, 1 ) 
scale with the technology factor S. This leads to the scaling behavior illustrated in Table 
4.8. Be aware that this is only a first-order analysis, intended to look at overall trends. 
Effects such a fringing capacitance are ignored, and breakthroughs in semiconductor tech- 
nology such as new interconnect and dielectric materials are also not considered. 



Table 4.8 Ideal Scaling of Wire Properties 



Parameter 


Relation 


Local Wire 


Constant Length 


Global Wire 


W,H,I 




1/S 


1/S 


1/S 


L 




1/S 


1 


1/S C 


C 


LW/t 


1/S 


1 


1/Sc 


R 


LIWH 


s 


S 2 


S 2 /S c 


CR 


L 2 /Ht 


1 


S 2 


s 2 /s c 2 



The eye-catching conclusion of this exercise is that scaling of the technology does 
not reduce wire delay (as personified by the RC time-constant). A constant delay is pre- 
dicted for local wires, while the delay of the global wires goes up with 50% per year (for S 
= 1.15 and S c = 0.94). This is in great contrast with the gate delay, which reduces from 
year to year. This explains why wire delays are starting to play a predominant role in 
today’s digital integrated circuit design. 

The ideal scaling approach clearly has problems, as it causes a rapid increase in wire 
resistance. This explains why other interconnect scaling techniques are attractive. One 
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option is to scale the wire thickness at a different rate. The “constant resistance” model of 
Table 4.9 explores the impact of not scaling the wire thickness at all. While this approach 
seemingly has a positive impact on the performance, it causes the fringing and inter-wire 
capacitance components to come to the foreground. We therefore introduce an extra 
capacitance scaling factor 8 C (> 1), that captures the increasingly horizontal nature of the 
capacitance when wire widths and pitches are shrunk while the height is kept constant. 



Table 4.9 “Constant Resistance” Scaling of Wire Properties 



Parameter 


Relation 


Local Wire 


Constant Length 


Global Wire 


W, t 




1/5 


1/5 


1/5 


H 




1 


1 


1 


L 




1/5 


1 


1/5 C 


C 


E'ZW/t 


e/5 


e c 


e/5 c 


R 


LIWH 


1 


5 


5/5 c 


CR 


L 2 /Ht 


e/5 


e c5 


e c S/5 c 2 



This scaling scenario offers a slightly more optimistic perspective, assuming of 
course that e c < S. Yet, delay is bound to increase substantially for intermediate and long 
wires, independent of the scaling scenario. To keep these delays from becoming exces- 
sive, interconnect technology has to be drastically improved. One option is to use better 
interconnect (Cu) and insulation materials (polymers and air). The other option is to dif- 
ferentiate between local and global wires. In the former, density and low-capacitance are 
crucial, while keeping the resistance under control is crucial in the latter. To address these 
conflicting demands, modern interconnect topologies combine a dense and thin wiring 
grid at the lower metal layers with fat, widely spaced wires at the higher levels, as is illus- 
trated in Figure 4.28. Even with these advances, it is obvious that interconnect will play a 
dominant role in both high-performance and low-energy circuits for years to come. 



4.7 Summary 

This chapter has presented a careful and in-depth analysis of the role and the behavior of 
the interconnect wire in modern semiconductor technology. The main goal is to identify 
the dominant parameters that set the values of the wire parasitics (being capacitance, resis- 
tance, and inductance), and to present adequate wire models that will aid us in the further 
analysis and optimization of complex digital circuits. 
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Figure 4.28 Interconnect hierarchy of 0.25 pm 
CMOS process, drawn to scale. 



4.8 To Probe Further 

Interconnect and its modeling is a hotly debated topic, that receives major attention in 
journals and conferences. A number of textbooks and reprint volumes have been pub- 
lished. [Bakoglu90], [Tewksbury94], and [Dally98] present an in-depth coverage of inter- 
connect issues, and are a valuable resource for further browsing. 
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Chapter 4 
Problems 



1. [M, None, 4.x] Figure 0.1 shows a clock-distribution network. Each segment of the clock net- 

work (between the nodes) is 5 mm long, 3 pm wide, and is implemented in polysilicon. At 
each of the terminal nodes (such as R) resides a load capacitance of 100 fF. 

a. Determine the average current of the clock driver, given a voltage swing on the clock lines 
of 5 V and a maximum delay of 5 nsec between clock source and destination node R. For 
this part, you may ignore the resistance and inductance of the network 

b. Unfortunately the resistance of the polysilicon cannot be ignored. Assume that each 
straight segment of the network can be modeled as a Il-network. Draw the equivalent cir- 
cuit and annotate the values of resistors and capacitors. 

c. Determine the dominant time-constant of the clock response at node R. 




Figure 0.1 Clock-distribution network. 



2. [C, SPICE, 4.x] You are designing a clock distribution network in which it is critical to mini- 
mize skew between local clocks (CLK 1, CLK2, and CLK3). You have extracted the RC net- 
work of Figure 0.2, which models the routing parasitics of your clock line. Initially, you 
notice that the path to CLKl is shorter than to CLKl or CLK2. In order to compensate for this 
imbalance, you insert a transmission gate in the path of CLK3 to eliminate the skew. 

a. Write expressions for the time-constants associated with nodes CLKl, CLKl and CLK3. 
Assume the transmission gate can be modeled as a resistance R 3 . 

b. If R\ = Ri = R 4 = R 5 = R and C, = C, = C, = C 4 = C s = C, what value of R 3 is required to 
balance the delays to CLKl, CLKl, and CLK32 

c. For R = 750D and C = 200fF, what (W/L)' s are required in the transmission gate to elimi- 
nate skew? Determine the value of the propagation delay. 

d. Simulate the network using SPICE, and compare the obtained results with the manually 
obtained numbers. 

3. [M, None. 4.x]Consider a CMOS inverter followed by a wire of length L. Assume that in the 
reference design, inverter and wire contribute equally to the total propagation delay t pref . You 
may assume that the transistors are velocity-saturated. The wire is scaled in line with the ideal 
wire scaling model. Assume initially that the wire is a local wire. 

a. Determine the new (total) propagation delay as a a function of t pKf , assuming that technol- 
ogy and supply voltage scale with a factor 2. Consider only first-order effects. 

b. Perform the same analysis, assuming now that the wire scales a global wire, and the wire 
length scales inversely proportional to the technology. 
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Clock 
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*3 
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Figure 0.2 RC clock-distribution 
network. 



c. Repeat b. but assume now that the wire is scaled along the constant resistance model. You 
may ignore the effect of the fringing capacitance. 

d. Repeat b, but assume that the new technology uses a better wiring material that reduces 
the resistivity by half, and a dielectric with a 25% smaller permittivity. 

e. Discuss the energy dissipation of part a. as a function of the energy dissipation of the orig- 
inal design E.. 

f. Determine for each of the statements below if it is true, false, or undefined, and explain in 
one line your answer. 

- When driving a small fan-out, increasing the driver transistor sizes raises the short- 
circuit power dissipation. 

- Reducing the supply voltage, while keeping the threshold voltage constant decreases 
the short-circuit power dissipation. 

- Moving to Copper wires on a chip will enable us to build faster adders. 

- Making a wire wider helps to reduce its RC delay. 

- Going to dielectrics with a lower permittivity will make RC wire delay more impor- 
tant. 

4 . [M, None, 4.x] A two-stage buffer is used to drive a metal wire of 1 cm. The first inverter is of 
minimum size with an input capacitance Ci=10 fF and an internal propagation delay t p0 =50 ps 
and load dependent delay of 5ps/fF. The width of the metal wire is 3.6 pm. The sheet resis- 
tance of the metal is 0.08 O/ , the capacitance value is 0.03 fF/pm2 and the fringing field 
capacitance is 0.04fF/pm. 

a. What is the propagation delay of the metal wire? 

b. Compute the optimal size of the second inverter. What is the minimum delay through the 
buffer? 

c. If the input to the first inverter has 25% chance of making a 0-to-l transition, and the 
whole chip is running at 20MHz with a 2.5 supply voltage, then what’s the power con- 
sumed by the metal wire? 

5. [M, None, 4.x]To connect a processor to an external memory an off -chip connection is neces- 
sary. The copper wire on the board is 15 cm long and acts as a transmission line with a charac- 
teristic impedance of 100£2(See Figure 0.3). The memory input pins present a very high 
impedance which can be considered infinite. The bus driver is a CMOS inverter consisting of 
very large devices: (50/0.25) for the NMOS and (150/0.25) for the PMOS, where all sizes are 
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in pm. The minimum size device, (0.25/0.25) for NMOS and (0.75/0.25) for PMOS, has the 
on resistance 35 kfl 

a. Determine the time it takes for a change in the signal to propagate from source to destina- 
tion (time of flight). The wire inductance per unit length equals 75*10‘ 8 H/m. 

b. Determine how long it will take the output signal to stay within 10% of its final value. You 
can model the driver as a voltage source with the driving device acting as a series resis- 
tance. Assume a supply and step voltage of 2.5V. Hint: draw the lattice diagram for the 
transmission line. 

c. Resize the dimensions of the driver to minimize the total delay. 



L=15cm 




Figure 0.3 The driver, the connecting copper wire and the 
memory block being accessed. 

6. [M, None, 4.x] A two stage buffer is used to drive a metal wire of 1 cm. The first inverter is a 
minimum size with an input capacitance Cj=10 fF and a propagation delay t p0 = 175 ps when 
loaded with an identical gate. The width of the metal wire is 3.6 pm. The sheet resistance of 
the metal is 0.08 £V , the capacitance value is 0.03 fF/pm2 and the fringing field capacitance 
is 0.04 fF/pm. 

a. What is the propagation delay of the metal wire? 

b. Compute the optimal size of the second inverter. What is the minimum delay through the 
buffer? 

7. [M, None, 4.x] For the RC tree given in Figure 0.4 calculate the Elmore delay from node A to 
node B using the values for the resistors and capacitors given in the below in Table 0.1. 




Figure 0.4 RC tree for calculating the delay 
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Table 0.1 Values of the components in the RC tree of Figure 0.4 



Resistor 


Value! Cl) 


Capacitor 


Value(fF) 


R1 


0.25 


Cl 


250 


R2 


0.25 


C2 


750 


R3 


0.50 


C3 


250 


R4 


100 


C4 


250 


R5 


0.25 


C5 


1000 


R6 


1.00 


C6 


250 


R7 


0.75 


Cl 


500 


R8 


1000 


C8 


250 



8. [M, SPICE, 4.x] In this problem the various wire models and their respective accuracies will 

be studied. 

a. Compute the 0%-50% delay of a 500um x 0.5um wire with resistance of 0.08 QJ , with 
area capacitance of 30aF/um2, and fringing capacitance of 40aF/um. Assume the driver 
has a lOOfl resistance and negligible output capacitance. 

• Using a lumped model for the wire. 

• Using a PI model for the wire, and the Elmore equations to find tau. (see Chapter 4, figure 
4.26). 

• Using the distributed RC line equations from Chapter 4, section 4.4.4. 

b. Compare your results in part a. using spice (be sure to include the source resistance). For 
each simulation, measure the 0%-50% time for the output 

• First, simulate a step input to a lumped R-C circuit. 

• Next, simulate a step input to your wire as a PI model. 

• Unfortunately, our version of SPICE does not support the distributed RC model as 
described in your book (Chapter 4, section 4.5.1). Instead, simulate a step input to your 
wire using a PI3 distributed RC model. 

9. [M, None. 4.x] A standard CMOS inverter drives an aluminum wire on the first metal layer. 

Assume Rn=4kfi Rp=6kfl Also, assume that the output capacitance of the inverter is negli- 
gible in comparison with the wire capacitance. The wire is .5um wide, and the resistivity is 

0.08 0/ .. 

a. What is the "critical length" of the wire? 

b. What is the equivalent capacitance of a wire of this length? (For your capacitance calcula- 
tions, use Table 4.2 of your book , assume there’s field oxide underneath and nothing 
above the aluminum wire) 
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10 . [M, None, 4.x] A 10cm long lossless transmission line on a PC board (relative dielectric con- 
stant = 9, relative permeability = 1) with characteristic impedance of 5012 is driven by a 2.5V 
pulse coming from a source with 15012 resistance. 

a. If the load resistance is infinite, determine the time it takes for a change at the source to 
reach the load (time of flight). 

Now a 20012 load is attached at the end of the transmission line. 

b. What is the voltage at the load at t = 3ns? 

c. Draw lattice diagram and sketch the voltage at the load as a function of time. Determine 
how long does it take for the output to be within 1 percent of its final value. 

11 . [C, SPICE, 4.x] Assume V DD =1.5V. Also, use short-channel transistor models forhand analy- 
sis. 



Vdd V dd 




a. The Figure 0.5 shows an output driver feeding a 0.2 pF effective fan-out of CMOS gates 
through a transmission line. Size the two transistors of the driver to optimize the delay. 
Sketch waveforms of V s and V L , assuming a square wave input. Label critical voltages 
and times. 

b. Size down the transistors by m times (m is to be treated as a parameter). Derive a first 
order expression for the time it takes for V L to settle down within 10% of its final voltage 
level.Compare the obtained result with the case where no inductance is associated with the 
wire.Please draw the waveforms of V L for both cases, and comment. 

c. Use the transistors as in part a). Suppose C L is changed to 20pF. Sketch waveforms of V s 
and V L , assuming a square wave input. Label critical voltages and instants. 

d. Assume now that the transmission line is lossy. Perform Hspice simulation for three cases: 
R=100 12/cm; R=2.5 12/cm; R=0.5 12/cm. Get the waveforms of V s , V L and the middle 
point of the line. Discuss the results. 

12 . [M, None, 4.x] Consider an isolated 2mm long and l(im wide M 1 (Metal 1 ) wire over a silicon 
substrate driven by an inverter that has zero resistance and parasitic output capccitance. How 
will the wire delay change for the following cases? Explain your reasoning in each case. 

a. If the wire width is doubled. 

b. If the wire length is halved. 

c. If the wire thickness is doubled. 

d. If thickness of the oxide between the Ml and the substrate is doubled. 

13 . [E, None, 4.x] In an ideal scaling model, where all dimensions and voltages scale with a fac- 
tor of S >1 : 
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a. How does the delay of an inverter scale? 

b. If a chip is scaled from one technology to another where all wire dimensionsjncluding the 
vertical one and spacing, scale with a factor of S, how does the wire delayscale? How does 
the overall operating frequency of a chip scale? 

c. Repeat b) for the case where everything scales, except the vertical dimension of wires (it 
stays constant). 
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5.1 Introduction 

The inverter is truly the nucleus of all digital designs. Once its operation and properties are 
clearly understood, designing more intricate structures such as NAND gates, adders, mul- 
tipliers, and microprocessors is greatly simplified. The electrical behavior of these com- 
plex circuits can be almost completely derived by extrapolating the results obtained for 
inverters. The analysis of inverters can be extended to explain the behavior of more com- 
plex gates such as NAND, NOR, or XOR, which in turn form the building blocks for mod- 
ules such as multipliers and processors. 

In this chapter, we focus on one single incarnation of the inverter gate, being the 
static CMOS inverter — or the CMOS inverter, in short. This is certainly the most popular 
at present, and therefore deserves our special attention. We analyze the gate with respect 
to the different design metrics that were outlined in Chapter 1 : 

• cost, expressed by the complexity and area 

• integrity and robustness , expressed by the static (or steady-state) behavior 

• performance , determined by the dynamic (or transient) response 

• energy efficiency, set by the energy and power consumption 

From this analysis arises a model of the gate that will help us to identify the parame- 
ters of the gate and to choose their values so that the resulting design meets desired speci- 
fications. While each of these parameters can be easily quantified for a given technology, 
we also discuss how they are affected by scaling of the technology. 

While this Chapter focuses uniquely on the CMOS inverter, we will see in the fol- 
lowing Chapter that the same methodology also applies to other gate topologies. 



5.2 The Static CMOS Inverter — An Intuitive Perspective 



Figure 5.1 shows the circuit diagram of a static CMOS inverter. Its operation is readily 
understood with the aid of the simple switch model of the MOS transistor, introduced in 
Chapter 3 (Figure 3.25): the transistor is nothing more than a switch with an infinite off- 
resistance (for IV GS I < IVj-l), and a finite on-resistance (for IV GS I > \V T \). This leads to the 
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C L 
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Figure 5.1 Static CMOS inverter. V DD stands for the 
supply voltage. 
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following interpretation of the inverter. When V in is high and equal to V DD , the NMOS 
transistor is on, while the PMOS is off. This yields the equivalent circuit of Figure 5.2a. A 
direct path exists between V out and the ground node, resulting in a steady-state value of 0 
V. On the other hand, when the input voltage is low (0 V), NMOS and PMOS transistors 
are off and on, respectively. The equivalent circuit of Figure 5.2b shows that a path exists 
between V DD and V ouP yielding a high output voltage. The gate clearly functions as an 
inverter. 




Figure 5.2 Switch models of CMOS 
inverter. 



A number of other important properties of static CMOS can be derived from this switch- 
level view: 

• The high and low output levels equal V DD and GND , respectively; in other words, 
the voltage swing is equal to the supply voltage. This results in high noise margins. 

• The logic levels are not dependent upon the relative device sizes, so that the transis- 
tors can be minimum size. Gates with this property are called ratioless. This is in 
contrast with ratioed logic , where logic levels are determined by the relative dimen- 
sions of the composing transistors. 

• In steady state, there always exists a path with finite resistance between the output 
and either V DD or GND. A well-designed CMOS inverter, therefore, has a low out- 
put impedance , which makes it less sensitive to noise and disturbances. Typical val- 
ues of the output resistance are in kO range. 

• The input resistance of the CMOS inverter is extremely high, as the gate of an MOS 
transistor is a virtually perfect insulator and draws no dc input current. Since the 
input node of the inverter only connects to transistor gates, the steady-state input 
current is nearly zero. A single inverter can theoretically drive an infinite number of 
gates (or have an infinite fan-out) and still be functionally operational; however, 
increasing the fan-out also increases the propagation delay, as will become clear 
below. So, although fan-out does not have any effect on the steady-state behavior, it 
degrades the transient response. 
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• No direct path exists between the supply and ground rails under steady-state operat- 
ing conditions (this is, when the input and outputs remain constant). The absence of 
current flow (ignoring leakage currents) means that the gate does not consume any 
static power. 

SIDELINE: The above observation, while seemingly obvious, is of crucial importance, 
and is one of the primary reasons CMOS is the digital technology of choice at present. The 
situation was very different in the 1970s and early 1980s. All early microprocessors, such 
as the Intel 4004, were implemented in a pure NMOS technology. The lack of comple- 
mentary devices (such as the NMOS and PMOS transistor) in such a technology makes 
the realization of inverters with zero static power non-trivial. The resulting static power 
consumption puts a firm upper bound on the number of gates that can be integrated on a 
single die; hence the forced move to CMOS in the 1980s, when scaling of the technology 
allowed for higher integration densities. 

The nature and the form of the voltage-transfer characteristic (VTC) can be graphi- 
cally deduced by superimposing the current characteristics of the NMOS and the PMOS 
devices. Such a graphical construction is traditionally called a load-line plot. It requires 
that the I-V curves of the NMOS and PMOS devices are transformed onto a common coor- 
dinate set. We have selected the input voltage V in , the output voltage V out and the NMOS 
drain current I DN as the variables of choice. The PMOS I-V relations can be translated into 
this variable space by the following relations (the subscripts n and p denote the NMOS 
and PMOS devices, respectively): 





^DSp 


= ~^DSn 








^CSn ~ 


v in ; 


VcSp = 


V in - 


Vdd 


(5.1) 


V DSn = 


v out ; 


Vds p = 


V 

out 


~ Vdd 





The load-line curves of the PMOS device are obtained by a mirroring around the x- 
axis and a horizontal shift over V DD . This procedure is outlined in Figure 5.3, where the 
subsequent steps to adjust the original PMOS I-V curves to the common coordinate set V in , 
V out and I Dn are illustrated. 




Figure 5.3 Transforming PMOS I-V characteristic to a common coordinate set 
(assuming VDD = 2.5 V). 
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Figure 5.4 Load curves for NMOS and PMOS transistors of the static CMOS inverter (V DD = 2.5 V). The dots 
represent the dc operation points for various input voltages. 



The resulting load lines are plotted in Figure 5.4. For a dc operating points to be 
valid, the currents through the NMOS and PMOS devices must be equal. Graphically, this 
means that the dc points must be located at the intersection of corresponding load lines. A 
number of those points (for V in = 0, 0.5, 1, 1.5, 2, and 2.5 V) are marked on the graph. As 
can be observed, all operating points are located either at the high or low output levels. 
The VTC of the inverter hence exhibits a very narrow transition zone. This results from 
the high gain during the switching transient, when both NMOS and PMOS are simulta- 
neously on, and in saturation. In that operation region, a small change in the input voltage 
results in a large output variation. All these observations translate into the VTC of Figure 
5.5. 




Figure 5.5 VTC of static CMOS inverter, 
derived from Figure 5.4 ( V DD = 2.5 V). For each 
operation region, the modes of the transistors are 
annotated — off, res(istive), or sat(urated). 



Before going into the analytical details of the operation of the CMOS inverter, a 
qualitative analysis of the transient behavior of the gate is appropriate as well. This 
response is dominated mainly by the output capacitance of the gate, C L , which is com- 
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Vdd 




V- = 0 



(a) Low-to-high 




(b) High-to-low 



Figure 5.6 Switch model of 
dynamic behavior of static CMOS 
inverter. 



posed of the drain diffusion capacitances of the NMOS and PMOS transistors, the capaci- 
tance of the connecting wires, and the input capacitance of the fan-out gates. Assuming 
temporarily that the transistors switch instantaneously, we can get an approximate idea of 
the transient response by using the simplified switch model again (Figure 5.6). Let us con- 
sider the low-to-high transition first (Figure 5.6a). The gate response time is simply deter- 
mined by the time it takes to charge the capacitor C L through the resistor R p . In Example 
4.5, we learned that the propagation delay of such a network is proportional to the its time 
constant R p C L . Hence, a fast gate is built either by keeping the output capacitance 
small or by decreasing the on-resistance of the transistor. The latter is achieved by 
increasing the W/L ratio of the device. Similar considerations are valid for the high-to-low 
transition (Figure 5.6b), which is dominated by the R n C L time-constant. The reader should 
be aware that the on-resistance of the NMOS and PMOS transistor is not constant, but is a 
nonlinear function of the voltage across the transistor. This complicates the exact determi- 
nation of the propagation delay. An in-depth analysis of how to analyze and optimize the 
performance of the static CMOS inverter is offered in Section 5.4. 




5.3 Evaluating the Robustness of the CMOS Inverter: The Static Behavior 

In the qualitative discussion above, the overall shape of the voltage-transfer characteristic 
of the static CMOS inverter was derived, as were the values of V 0H and V 0L (V DD and 
GND , respectively). It remains to determine the precise values of V M , V IH , and V u as well 
as the noise margins. 

5.3.1 Switching Threshold 

The switching threshold, V M , is defined as the point where V in = V out . Its value can be 
obtained graphically from the intersection of the VTC with the line given by V m = V out 
(see Figure 5.5). In this region, both PMOS and NMOS are always saturated, since V DS = 
V cs . An analytical expression for V M is obtained by equating the currents through the tran- 
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sistors. We solve the case where the supply voltage is high so that the devices can be 
assumed to be velocity-saturated (or V DSAT < V M - V T ). We furthermore ignore the channel- 
length modulation effects. 



k„V r 



V U -V T 



V , 



DSATn 



x n v DSATn]^ v M " 

Solving for V M yields 



j + k p V DSATp {v M ^dd ^Tp 2 P ) _ ® (5.2) 



Vm — 



v Tn + ■ 



V, 



DSATn 



M 



V DD+ V Tp + 



VpSATp'] 



1 + r 



- with r = k P VDSA Tp = V satp W p (5 3) 



k n V DSATn ^satn^n 



assuming identical oxide thicknesses for PMOS and NMOS transistors. For large values 
of V DD (compared to threshold and saturation voltages), Eq. (5.3) can be simplified: 



V, 



rV 



DD 



1 + r 



(5.4) 



Eq. (5.4) states that the switching threshold is set by the ratio r, which compares the rela- 
tive driving strengths of the PMOS and NMOS transistors. It is generally considered to be 
desirable for V M to be located around the middle of the available voltage swing (or at 
V dd I2), since this results in comparable values for the low and high noise margins. This 
requires r to be approximately 1, which is equivalent to sizing the PMOS device so that 
(W/L) p = (W/L) n X (V DSATn k' n )/(V DSATn k ' p ). To move V M upwards, a larger value of r is 
required, which means making the PMOS wider. Increasing the strength of the NMOS, on 
the other hand, moves the switching threshold closer to GND. 

From Eq. (5.2), we can derive the required ratio of PMOS versus NMOS transistor 
sizes such that the switching threshold is set to a desired value V M . When using this 
expression, please make sure that the assumption that both devices are velocity-saturated 
still holds for the chosen operation point. 



( W / L) p _ k „V dsat „(.Vm ~ y Tn ~ V DSATn '2) 

( W/L) n k' p V DSATp ( V DD -V M + V Tp + V DSATp /2) 



Problem 5.1 Inverter switching threshold for long-channel devices, or low supply-volt- 
ages. 

The above expressions were derived under the assumption that the transistors are velocity- 
saturated. When the PMOS and NMOS are long-channel devices, or when the supply volt- 
age is low, velocity saturation does not occur (V M -V T < V DSAT ). Under these circumstances, 
Eq. (5.6) holds for V M . Derive. 



V* 



VTn + r(V DD +V Tp ) 



with 







1 + r 



(5.6) 
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Design Technique — Maximizing the noise margins 



When designing static CMOS circuits, it is advisable to balance the driving strengths of the 
transistors by making the PMOS section wider than the NMOS section, if one wants to maxi- 
mize the noise margins and obtain symmetrical characteristics. The required ratio is given by 
Eq. (5.5). 



Example 5.1 Switching threshold of CMOS inverter 

We derive the sizes of PMOS and NMOS transistors such that the switching threshold of a 
CMOS inverter, implemented in our generic 0.25 pm CMOS process, is located in the middle 
between the supply rails. We use the process parameters presented in Example 3.7, and 
assume a supply voltage of 2.5 V. The minimum size device has a width/length ratio of 1.5. 
With the aid of Eq. (5.5), we find 

(W / L) p = 115 x 10~ 6 x 063 x (1.25-0.43-0.63/2) = 3 5 
(W/L) n 30 x 10 -6 L0 (1.25-0.4-1.0/2) 

Figure 5.7 plots the values of switching threshold as a function of the PMOS/NMOS 
ratio, as obtained by circuit simulation. The simulated PMOS/NMOS ratio of 3.4 for a 1.25 V 
switching threshold confirms the value predicted by Eq. (5.5). 



An analysis of the curve of Figure 5.7 produces some interesting observations: 

1. V M is relatively insensitive to variations in the device ratio. This means that small 
variations of the ratio (e.g., making it 3 or 2.5) do not disturb the transfer character- 
istic that much. It is therefore an accepted practice in industrial designs to set the 
width of the PMOS transistor to values smaller than those required for exact sym- 
metry. For the above example, setting the ratio to 3, 2.5, and 2 yields switching 
thresholds of 1.22 V, 1.18 V, and 1.13 V, respectively. 
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inverter modified threshold 

Figure 5.8 Changing the inverter threshold can improve the circuit reliability. 



2. The effect of changing the W p /W n ratio is to shift the transient region of the VTC. 
Increasing the width of the PMOS or the NMOS moves V M towards V DD or GND 
respectively. This property can be very useful, as asymmetrical transfer characteris- 
tics are actually desirable in some designs. This is demonstrated by the example of 
Figure 5.8. The incoming signal V m has a very noisy zero value. Passing this signal 
through a symmetrical inverter would lead to erroneous values (Figure 5.8a). This 
can be addressed by raising the threshold of the inverter, which results in a correct 
response (Figure 5.8b). Further in the text, we will see other circuit instances where 
inverters with asymmetrical switching thresholds are desirable. 

Changing the switching threshold by a considerable amount is however not easy, 
especially when the ratio of supply voltage to transistor threshold is relatively small 
(2. 5/0.4 = 6 for our particular example). To move the threshold to 1.5 V requires a 
transistor ratio of 1 1, and further increases are prohibitively expensive. Observe that 
Figure 5.7 is plotted in a semi-log format. 



5.3.2 Noise Margins 





By definition, V IH and V u are the operational points of the inverter where 



dV;, 



-1 . In 



the terminology of the analog circuit designer, these are the points where the gain g of the 
amplifier, formed by the inverter, is equal to -1. While it is indeed possible to derive ana- 
lytical expressions for V IH and V IL , these tend to be unwieldy and provide little insight in 
what parameters are instrumental in setting the noise margins. 

A simpler approach is to use a piece wise linear approximation for the VTC, as 
shown in Figure 5.9. The transition region is approximated by a straight line, the gain of 
which equals the gain g at the switching threshold V M . The crossover with the V 0H and the 
V 0L lines is used to define V IH and V u points. The error introduced is small and well 
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Figure 5.9 A piece-wise linear 
approximation of the VTC simplifies the 
derivation of V IL and V,„. 



within the range of what is required for an initial design. This approach yields the follow- 
ing expressions for the width of the transition region V IH - V IL , V IH , V IL , and the noise mar- 
gins NM h and NM L . 




V,„- V„ = 



( Voh-Vol ) 






V 

v ” 

v M 



M 



V 



IL 



Vm + 



Vdd-V . 



M 



g g 

nm h = v DD -v IH nm l = v il 



(5.7) 



These expressions make it increasingly clear that a high gain in the transition region is 
very desirable. In the extreme case of an infinite gain, the noise margins simplify to V 0H - 
V M and V M - V 0L for NM H and NM L , respectively, and span the complete voltage swing. 

Remains us to determine the midpoint gain of the static CMOS inverter. We assume 
once again that both PMOS and NMOS are velocity-saturated. It is apparent from Figure 
5.4 that the gain is a strong function of the slopes of the currents in the saturation region. 
The channel-length modulation factor hence cannot be ignored in this analysis — doing so 
would lead to an infinite gain. The gain can now be derived by differentiating the current 
equation (5.8), valid around the switching threshold, with respect to V in . 



k„V, 



V,„ - V T „ - ■ 



V, 



DSATn | 



n v DSATn\ v in v Tn 2 J 

v , 



o + K v out ) + 



kp^DSATpiVin ~ ^DD~^Tp D ^ A7 >j( 1 + \ p V out - ’k p V DD ) - 0 
Differentiation and solving for dV 0U /dV m yields 



(5.8) 



V 



K Ydsa Tn (1 + K YoJ + K Ydsa t,M + K v„„, - K Vpp) 

' (V. - V /?1 + 1.1- V tv. - V„ V„ - 






Ignoring some second-order terms, and setting V in = V M results in the gain expression. 
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1 Kt Y_ PSA Tn + kp V DS A Tp 

IdWm) 

l_H-_r 

( ~ ^ Tn ~ ' / DSAr»/ 2 )( - X p ) 



(5.10) 



with I D (V M ) the current flowing through the inverter for Vj„ = V M . The gain is almost 
purely determined by technology parameters, especially the channel length modulation. It 
can only in a minor way be influenced by the designer through the choice of supply and 
switching threshold voltages. 



Example 5.2 Voltage transfer characteristic and noise margins of CMOS Inverter 



Assume an inverter in the generic 0.25 pm CMOS technology designed with a PMOS/NMOS 
ratio of 3.4 and with the NMOS transistor minimum size (IV = 0.375 pm, L = 0.25 pm, WIL = 
1.5). We first compute the gain at V M (= 1.25 V), 

I D (V M ) = 1.5 x 1 15 x 10 6 x 0.63 x (1.25 - 0.43 - 0.63/2) x (1 + 0.06 x 1.25) = 59xlO“ 6 A 



1 



1.5 x 115 x 10 6 x 0.63 + 1.5 x 3.4 x 30 x 10 6 x 1.0 



59 x 10 



0.06 + 0.1 



-27.5 (Eq. 5.10) 



This yields the following values for V„, V IH , NM L , NM H : 

V IL = 1.2 V, V IH = 1.3 V, NM l = NM h = 1.2. 



Figure 5.10 plots the simulated VTC of the inverter, as well as its derivative, the gain. A close 
to ideal characteristic is obtained. The actual values of V u and V IH are 1.03 V and 1.45 V, 
respectively, which leads to noise margins of 1.03 V and 1.05 V. These values are lower than 
those predicted for two reasons: 

• Eq. (5.10) overestimates the gain. As observed in Figure 5.10b, the maximum gain (at 
V M ) equals only 17. This reduced gain would yield values for V u and V IH of 1.17 V, and 1.33 
V, respectively. 

• The most important deviation is due to the piecewise linear approximation of the 
VTC, which is optimistic with respect to the actual noise margins. 

The obtained expressions are however perfectly useful as first-order estimations as 
well as means of identifying the relevant parameters and their impact. 

To conclude this example, we also extracted from simulations the output resistance of 
the inverter in the low- and high-output states. Low values of 2.4 kfl and 3.3 k£2 were 
observed, respectively. The output resistance is a good measure of the sensitivity of the gate 
in respect to noise induced at the output, and is preferably as low as possible. 




SIDELINE: Surprisingly (or not so surprisingly), the static CMOS inverter can also be 
used as an analog amplifier, as it has a fairly high gain in its transition region. This region 
is very narrow however, as is apparent in the graph of Figure 5.10b. It also receives poor 
marks on other amplifier properties such as supply noise rejection. Yet, this observation 
can be used to demonstrate one of the major differences between analog and digital 
design. Where the analog designer would bias the amplifier in the middle of the transient 
region, so that a maximum linearity is obtained, the digital designer will operate the 
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Figure 5.10 Simulated Voltage Transfer Characteristic (a) and voltage gain (b) of CMOS inverter (0.25 (tm CMOS. Idd 
= 2.5 V). 



device in the regions of extreme nonlinearity, resulting in well-defined and well-separated 
high and low signals. 



Problem 5.2 Inverter noise margins for long-channel devices 

Derive expressions for the gain and noise margins assuming that PMOS and NMOS are 
long-channel devices (or that the supply voltage is low), so that velocity saturation does 
not occur. 




5.3.3 Robustness Revisited 
Device Variations 

While we design a gate for nominal operation conditions and typical device parameters, 
we should always be aware that the actual operating temperature might very over a large 
range, and that the device parameters after fabrication probably will deviate from the nom- 
inal values we used in our design optimization process. Fortunately, the dc-characteristics 
of the static CMOS inverter turn out to be rather insensitive to these variations, and the 
gate remains functional over a wide range of operating conditions. This already became 
apparent in Figure 5.7, which shows that variations in the device sizes have only a minor 
impact on the switching threshold of the inverter. To further confirm the assumed robust- 
ness of the gate, we have re-simulated the voltage transfer characteristic by replacing the 
nominal devices by their worst- or best-case incarnations. Two corner-cases are plotted in 
Figure 5.11: a better-than-expected NMOS combined with an inferior PMOS, and the 
opposite scenario. Comparing the resulting curves with the nominal response shows that 
the variations mostly cause a shift in the switching threshold, but that the operation of the 
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V in (V) 



Figure 5.11 Impact of device variations on static CMOS 
inverter VTC. The “good” device has a smaller oxide 
thickness (- 3nm), a smaller length (-25 nm), a higher width 
(+30 nm), and a smaller threshold (-60 mV). The opposite 
is true for the “bad” transistor. 



gate is by no means affected. This robust behavior that ensures functionality of the gate 
over a wide range of conditions has contributed in a big way to the popularity of the static 
CMOS gate. 



Scaling the Supply Voltage 

In Chapter 3, we observed that continuing technology scaling forces the supply voltages to 
reduce at rates similar to the device dimensions. At the same time, device threshold volt- 
ages are virtually kept constant. The reader probably wonders about the impact of this 
trend on the integrity parameters of the CMOS inverter. Do inverters keep on working 
when the voltages are scaled and are there potential limits to the supply scaling? 

A first hint on what might happen was offered in Eq. (5.10), which indicates that the 
gain of the inverter in the transition region actually increases with a reduction of the sup- 
ply voltage! Note that for a fixed transistor ratio r, V M is approximately proportional to 
V DD . Plotting the (normalized) VTC for different supply voltages not only confirms this 
conjecture, but even shows that the inverter is well and alive for supply voltages close to 
the threshold voltage of the composing transistors (Figure 5.12a). At a voltage of 0.5 V — 
which is just 100 mV above the threshold of the transistors — the width of the transition 
region measures only 10% of the supply voltage (for a maximum gain of 35), while it wid- 
ens to 17% for 2.5 V. So, given this improvement in dc characteristics, why do we not 
choose to operate all our digital circuits at these low supply voltages? Three important 
arguments come to mind: 

• In the following sections, we will learn that reducing the supply voltage indiscrimi- 
nately has a positive impact on the energy dissipation, but is absolutely detrimental 
to the performance on the gate. 

• The dc-characteristic becomes increasingly sensitive to variations in the device 
parameters such as the transistor threshold, once supply voltages and intrinsic volt- 
ages become comparable. 

• Scaling the supply voltage means reducing the signal swing. While this typically 
helps to reduce the internal noise in the system (such as caused by crosstalk), it 
makes the design more sensitive to external noise sources that do not scale. 



A- 
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(a) Reducing V DD improves the gain... (b) but it detoriates for very-low supply voltages. 

Figure 5.12 VTC of CMOS inverter as a function of supply voltage (0.25 pm CMOS technology). 

To provide an insight into the question on potential limits to the voltage scaling, we 
have plotted in Figure 5.12b the voltage transfer characteristic of the same inverter for the 
even-lower supply voltages of 200 mV, 100 mV, and 50 mV (while keeping the transistor 
thresholds at the same level). Amazingly enough, we still obtain an inverter characteristic, 
this while the supply voltage is not even large enough to turn the transistors on! The expla- 
nation can be found in the sub-threshold operation of the transistors. The sub-threshold 
currents are sufficient to switch the gate between low and high levels, and provide enough 
gain to produce acceptable VTCs. The very low value of the switching currents ensures a 
very slow operation but this might be acceptable for some applications (such as watches, 
for example). 

At around 100 mV, we start observing a major deterioration of the gate characteris- 
tic. V 0L and V OH are no longer at the supply rails and the transition-region gain approaches 
1. The latter turns out to be a fundamental show-stopper. To achieving sufficient gain for 
use in a digital circuit, it is necessary that the supply must be at least a couple times (|) T = 
kT/q (=25 mV at room temperature), the thermal voltage introduced in Chapter 3 
[Swanson72], It turns out that below this same voltage, thermal noise becomes an issue as 
well, potentially resulting in unreliable operation. 

V D Dmin> I-*- (5.H) 

q 

Eq. (5.11) presents a true lower bound on supply scaling. It suggests that the only way to 
get CMOS inverters to operate below 100 mV is to reduce the ambient temperature, or in 
other words to cool the circuit. 



A- 



Problem 5.3 Minimum supply voltage of CMOS inverter 

Once the supply voltage drops below the threshold voltage, the transistors operate the sub- 
threshold region, and display an exponential current-voltage relationship (as expressed in 
Eq. (3.40)). Derive an expression for the gain of the inverter under these circumstances 
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(assume symmetrical NMOS and PMOS transistors, and a maximum gain at V M = V DD /2). 
The resulting expression demonstrates that the minimum voltage is a function of the slope 
factor n of the transistor. 




e 



- 1 ) 



(5.12) 



According to this expression, the gain drops to -1 at V DD = 48 mV (for n = 1.5 and § T = 25 
mV). 



5.4 Performance of CMOS Inverter: The Dynamic Behavior 

The qualitative analysis presented earlier concluded that the propagation delay of the 
CMOS inverter is determined by the time it takes to charge and discharge the load capaci- 
tor C L through the PMOS and NMOS transistors, respectively. This observation suggests 
that getting C L as small as possible is crucial to the realization of high-performance 
CMOS circuits. It is hence worthwhile to first study the major components of the load 
capacitance before embarking onto an in-depth analysis of the propagation delay of the 
gate. In addition to this detailed analysis, the section also presents a summary of tech- 
niques that a designer might use to optimize the performance of the inverter. 

5.4.1 Computing the Capacitances 

Manual analysis of MOS circuits where each capacitor is considered individually is virtu- 
ally impossible and is exacerbated by the many nonlinear capacitances in the MOS tran- 
sistor model. To make the analysis tractable, we assume that all capacitances are lumped 
together into one single capacitor C L , located between V out and GND. Be aware that this is 
a considerable simplification of the actual situation, even in the case of a simple inverter. 



V DD V DD 




Figure 5.13 Parasitic capacitances, influencing the transient behavior of the cascaded inverter pair. 
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Figure 5.13 shows the schematic of a cascaded inverter pair. It includes all the 
capacitances influencing the transient response of node V our It is initially assumed that the 
input V in is driven by an ideal voltage source with zero rise and fall times. Accounting 
only for capacitances connected to the output node, C L breaks down into the following 
components. 

Gate-Drain Capacitance C gdl2 

Ml and M2 are either in cut-off or in the saturation mode during the first half (up to 50% 
point) of the output transient. Under these circumstances, the only contributions to C gdl2 
are the overlap capacitances of both Ml and M2. The channel capacitance of the MOS 
transistors does not play a role here, as it is located either completely between gate and 
bulk (cut-off) or gate and source (saturation) (see Chapter 3). 

The lumped capacitor model now requires that this floating gate -drain capacitor be 
replaced by a capacitance-to-ground. This is accomplished by taking the so-called Miller 
effect into account. During a low-high or high-low transition, the terminals of the gate- 
drain capacitor are moving in opposite directions (Figure 5.14). The voltage change over 
the floating capacitor is hence twice the actual output voltage swing. To present an identi- 
cal load to the output node, the capacitance-to-ground must have a value that is twice as 
large as the floating capacitance. 

We use the following equation for the gate-drain capacitors: C gd = 2 C GDQ W (with 
C CD o the overlap capacitance per unit width as used in the SPICE model). For an in-depth 
discussion of the Miller effect, please refer to textbooks such as Sedra and Smith 
([Sedra87], p. 57). 1 




Figure 5.14 The Miller effect — A capacitor experiencing identical but opposite voltage swings at both 
its terminals can be replaced by a capacitor to ground, whose value is two times the original value. 





Diffusion Capacitances C dbl and C db2 

The capacitance between drain and bulk is due to the reverse-biased pn-j unction. Such a 
capacitor is, unfortunately, quite nonlinear and depends heavily on the applied voltage. 
We argued in Chapter 3 that the best approach towards simplifying the analysis is to 
replace the nonlinear capacitor by a linear one with the same change in charge for the volt- 
age range of interest. A multiplication factor K eq is introduced to relate the linearized 
capacitor to the value of the junction capacitance under zero-bias conditions. 

1 The Miller effect discussed in this context is a simplified version of the general analog case. In a digital 
inverter, the large scale gain between input and output always equals - 1 . 
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C eq = K eq C j0 (5.13) 

with Cj 0 the junction capacitance per unit area under zero-bias conditions. An expression 
for K eq was derived in Eq. (3.11) and is repeated here for convenience 

_ (Km 

R eq = 717 ri(^o-^) 1 - m -(^o-^w) 1 - m ] (5.14) 

Whigh-ViowK 1 -” 1 ) 

with (]) () the built-in junction potential and m the grading coefficient of the junction. 
Observe that the junction voltage is defined to be negative for reverse-biased junctions. 



Example 5.3 K eq for a 2.5 V CMOS Inverter 

Consider the inverter of Figure 5.13 designed in the generic 0.25 pm CMOS technology. The 
relevant capacitance parameters for this process were summarized in Table 3.5. 

Let us first analyze the NMOS transistor (C dbl in Figure 5.13). The propagation delay 
is defined by the time between the 50% transitions of the input and the output. For the CMOS 
inverter, this is the time-instance where V out reaches 1 .25 V, as the output voltage swing goes 
from rail to rail or equals 2.5 V. We, therefore, linearize the junction capacitance over the 
interval {2.5 V, 1.25 V} for the high-to-low transition, and {0, 1.25 Vj for the low-to-high 
transition. 

During the high-to-low transition at the output. V out initially equals 2.5 V. Because the 
bulk of the NMOS device is connected to GND . this translates into a reverse voltage of 2.5 V 
over the drain junction or V high = -2.5 V. At the 50% point, V out = 1.25 V or V hu , = -1.25 V. 
Evaluating Eq. (5.14) for the bottom plate and sidewall components of the diffusion capaci- 
tance yields 



Bottom plate: K eq (m = 0.5. (]) 0 = 0.9) = 0.57, 

Sidewall: K eqsn , (m = 0.44, 0 () = 0.9) = 0.61 

During the low-to-high transition, V hw and V high equal 0 V and -1.25 V, respectively, 
resulting in higher values for K eq , 

Bottom plate: K eq (m = 0.5, 0 O = 0.9) = 0.79, 

Sidewall: K eqsw (m = 0.44, <|> 0 = 0.9) = 0.81 

The PMOS transistor displays a reverse behavior, as its substrate is connected to 2.5 V. 
Hence, for the high-to-low transition (V low = 0, V high = -1.25 V), 

Bottom plate: K eq (m = 0.48, 4> 0 = 0.9) = 0.79, 

Sidewall: K eqsw (m = 0.32, 0 () = 0.9) = 0.86 

and for the low-to-high transition (V low = -1 .25 V, V hi h = -2.5 V) 

Bottom plate: K eq (m = 0.48, (]) 0 = 0.9) = 0.59, 

Sidewall: K eqsw (m = 0.32, (|) 0 = 0.9) = 0.7 




Using this approach, the junction capacitance can be replaced by a linear component 
and treated as any other device capacitance. The result of the linearization is a minor dis- 
tortion of the voltage waveforms. The logic delays are not significantly influenced by this 
simplification. 
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Wiring Capacitance C w 

The capacitance due to the wiring depends upon the length and width of the connecting 
wires, and is a function of the distance of the fanout from the driving gate and the number 
of fanout gates. As argued in Chapter 4, this component is growing in importance with the 
scaling of the technology. 

Gate Capacitance of Fanout and Cg4 

We assume that the fanout capacitance equals the total gate capacitance of the loading 
gates M3 and M4. Hence, 

Cfanout = C, ate (NMOS) + C gfl „(PMOS) 

= ( C-GSOn + C GDOn + W n L n C ox ) + ( C GSOp + C gd op + WpL p C 0X ) 

This expression simplifies the actual situation in two ways: 

• It assumes that all components of the gate capacitance are connected between V out 
and GND (or V DD ), and ignores the Miller effect on the gate-drain capacitances. This 
has a relatively minor effect on the accuracy, since we can safely assume that the 
connecting gate does not switch before the 50% point is reached, and V out2 , there- 
fore, remains constant in the interval of interest. 

• A second approximation is that the channel capacitance of the connecting gate is 
constant over the interval of interest. This is not exactly the case as we discovered in 
Chapter 3. The total channel capacitance is a function of the operation mode of the 
device, and varies from approximately 2/3 WLC ox (saturation) to the full WLC ox (lin- 
ear and cut-off). A drop in overall gate capacitance also occurs just before the tran- 
sistor turns on (Figure 3.30). During the first half of the transient, it may be assumed 
that one of the load devices is always in linear mode, while the other transistor 
evolves from the off-mode to saturation. Ignoring the capacitance variation results 
in a pessimistic estimation with an error of approximately 10%, which is acceptable 
for a first order analysis. 





Example 5.4 Capacitances of a 0.25 pm CMOS Inverter 

A minimum-size, symmetrical CMOS inverter has been designed in the 0.25 pm CMOS tech- 
nology. The layout is shown in Figure 5.15. The supply voltage V DD is set to 2.5 V. From the 
layout, we derive the transistor sizes, diffusion areas, and perimeters. This data is summarized 
in Table 5.1. As an example, we will derive the drain area and perimeter for the NMOS tran- 
sistor. The drain area is formed by the metal-diffusion contact, which has an area of 4 X 4 A 2 , 
and the rectangle between contact and gate, which has an area of 3 X 1 A 2 . This results in a 
total area of 19 A, 2 , or 0.30 pm 2 (as A = 0.125 pm). The perimeter of the drain area is rather 
involved and consists of the following components (going counterclockwise): 5 + 4 + 4 + 1 + 
1 = 15 A or PD = 15x0. 125 = 1.875 pm. Notice that the gate side of the drain perimeter is not 
included, as this is not considered a part of the side-wall. The drain area and perimeter of the 
PMOS transistor are derived similarly (the rectangular shape makes the exercise considerably 
simpler): AD = 5 x 9 A 2 = 45 A 2 , or 0.7 pm 2 ; PD = 5 + 9 + 5 = 19 A, or 2.375 pm. 












Figure 5.15 Layout of two chained, minimum-size inverters using SCMOS Design Rules (see also 
Color-plate 6). 



Table 5.1 Inverter transistor data. 





WIL 


AD (pm 2 ) 


PD (pm) 


AS (pm 2 ) 


PS (pm) 


NMOS 


0.375/0.25 


0.3 (19 T. 2 ) 


1.875 (157.) 


0.3 (19 T. 2 ) 


1.875 (157.) 


PMOS 


1.125/0.25 


0.7 (45 X 2 ) 


2.375 (197.) 


0.7 (45 X 2 ) 


2.375 (197.) 



This physical information can be combined with the approximations derived above to 
come up with an estimation of C L . The capacitor parameters for our generic process were 
summarized in Table 3.5, and repeated here for convenience: 

Overlap capacitance: CGDO(NMOS) = 0.31 fF/pm; CGDO(PMOS) = 0.27 fF/pm 
Bottom junction capacitance: CJ(NMOS) = 2 fF/pm 2 ; CJ(PMOS) =1.9 fF/pm 2 
Side-wall junction capacitance: CJSW(NMOS) = 0.28 fF/pm; CJSW(PMOS) = 0.22 
fF/pm 

Gate capacitance: CJNMOS) = C„ r (PMOS) = 6 fF/pm 2 



Finally, we should also consider the capacitance contributed by the wire, connecting 
the gates and implemented in metal 1 and polysilicon. A layout extraction program typically 
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will deliver us precise values for this parasitic capacitance. Inspection of the layout helps us 
to form a first-order estimate and yields that the metal- 1 and polysilicon areas of the wire, that 
are not over active diffusion, equal 42 X 2 and 72 X 2 , respectively. With the aid of the intercon- 
nect parameters of Table 4.2, we find the wire capacitance — observe that we ignore the 
fringing capacitance in this simple exercise. Due to the short length of the wire, this contribu- 
tion is ignorable compared to the other parasitics. 

C wire = 42/8 2 pm 2 X 30 aF/pnr + 72/8 2 pm 2 X 88 aF/pnr = 0. 12 fF 

Bringing all the components together results in Table 5.2. We use the values of K eq 
derived in Example 5.3 for the computation of the diffusion capacitances. Notice that the load 
capacitance is almost evenly split between its two major components: the intrinsic capaci- 
tance, composed of diffusion and overlap capacitances, and the extrinsic load capacitance, 
contributed by wire and connecting gate. 



Table 5.2 Components of C L (for high-to-low and low-to-high transitions). 



Capacitor 


Expression 


Value (fF) (H->L) 


Value (fF) (F— >H) 




2 CGD0 n W n 


0.23 


0.23 


C sd 2 


2 CGDOp W p 


0.61 


0.61 


Qm 


K eqn AD n CJ + K eqswn PD„CJSW 


0.66 


0.90 


Cdbl 


K eqp AD p CJ + K eqsW pPDpCJSW) 


1.5 


1.15 




(CGD0„+CGSO n ) W„ + C ox W„ L n 


0.76 


0.76 


C j4 


(CGDOp+CGSOp) W p + C ox W p L p 


2.28 


2.28 


c w 


From Extraction 


0.12 


0.12 


C L 


E 


6.1 


6.0 



5.4.2 Propagation Delay: First-Order Analysis 

One way to compute the propagation delay of the inverter is to integrate the capacitor 
(dis)charge current. This results in the expression of Eq. (5.16). 

'’2 

r C,(v) 

f p = \^rr dv (5 - 16) 

J t(v) 

Vi 

with i the (dis)charging current, v the voltage over the capacitor, and vq and v 2 the initial 
and final voltage. An exact computation of this equation is intractable, as both C L (v) and 
i(v) are nonlinear functions of v. We rather fall back to the simplified switch-model of the 
inverter introduced in Figure 5.6 to derive a reasonable approximation of the propagation 
delay adequate for manual analysis. The voltage-dependencies of the on-resistance and the 
load capacitor are addressed by replacing both by a constant linear element with a value 
averaged over the interval of interest. The preceding section derived precisely this value 
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for the load capacitance. An expression for the average on-resistance of the MOS transis- 
tor was already derived in Example 3.8, and is repeated here for convenience. 



_Z ,/p „ 5 

V dd /2 + XV) 4/ dsat 

W 2 



)n /2 1 I n 



( 1 _ ^ V od) 



(5.17) 



with /, 



,Wf 

DSAT - k "H ( ^DD ~ Vt) VdSAT' 



2 ) 





Deriving the propagation delay of the resulting circuit is now straightforward, and is 
nothing more than the analysis of a first-order linear RC-network, identical to the exercise 
of Example 4.5. There, we learned that the propagation delay of such a network for a volt- 
age step at the input is proportional to the time-constant of the network, formed by pull- 
down resistor and load capacitance. Hence, 

t pHL = ht(2 )R eqn C L = 0.69 R eqn C L (5.18) 

Similarly, we can obtain the propagation delay for the low-to-high transition, 

t pLH = 0.69 R eqp C L (5.19) 

with R the equivalent on-resistance of the PMOS transistor over the interval of interest. 
This analysis assumes that the equivalent load-capacitance is identical for both the high- 
to-low and low-to-high transitions. This has been shown to be approximately the case in 
the example of the previous section. The overall propagation delay of the inverter is 
defined as the average of the two values, or 

t - jpHL + r pLH _ Q go /y f ^_egn + ^_eqp 
p 2 \ 2 

Very often, it is desirable for a gate to have identical propagation delays for both rising 
and falling inputs. This condition can be achieved by making the on-resistance of the 
NMOS and PMOS approximately equal. Remember that this condition is identical to the 
requirement for a symmetrical VTC. 

Example 5.5 Propagation Delay of a 0.25 pm CMOS Inverter 

To derive the propagation delays of the CMOS inverter of Figure 5.15, we make use of Eq. 
(5.18) and Eq. (5.19). The load capacitance C L was already computed in Example 5.4, while 
the equivalent on-resistances of the transistors for the generic 0.25 pm CMOS process were 
derived in Table 3.3. For a supply voltage of 2.5 V, the normalized on-resistances of NMOS 
and PMOS transistors equal 13 k£2 and 31 kfl, respectively. From the layout, we determine 
the (W/L) ratios of the transistors to be 1.5 for the NMOS, and 4.5 for the PMOS. We assume 
that the difference between drawn and effective dimensions is small enough to be ignorable. 
This leads to the following values for the delays: 



(5.20) 












9 
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Figure 5.16 Simulated transient 
response of the inverter of Figure 
5.15. 



t (sec) 






t„ HI = 0.69 xfl^lx 6. IfF = 36 psec 



l pHL 



\ 1.5 ) 



t pLH = 0.69 x x 6.0fF = 29 psec 



V 4.5 ) 
and 
^ 36 + 2 9 ^ 



32.5 psec 



A- 



The accuracy of this analysis is checked by performing a SPICE transient simulation 
on the circuit schematic, extracted from the layout of Figure 5.15. The computed transient 
response of the circuit is plotted in Figure 5.16, and determines the propagation delays to be 
39.9 psec and 31.7 for the HL and LH transitions, respectively. The manual results are good 
considering the many simplifications made during their derivation. Notice especially the 
overshoots on the simulated output signals. These are caused by the gate-drain capacitances of 
the inverter transistors, which couple the steep voltage step at the input node directly to the 
output before the transistors can even start to react to the changes at the input. These over- 
shoots clearly have a negative impact on the performance of the gate, and explain why the 
simulated delays are larger than the estimations. 



WARNING: This example might give the impression that manual analysis always leads 
to close approximations of the actual response. This is not necessarily the case. Large 
deviations can often be observed between first- and higher-order models. The purpose of 
the manual analysis is to get a basic insight in the behavior of the circuit and to determine 
the dominant parameters. A detailed simulation is indispensable when quantitative data is 
required. Consider the example above a stroke of good luck. 













chapter5.fm Page 198 Friday, January 18,2002 9:01 AM 







198 



THE CMOS INVERTER Chapter 5 



The obvious question a designer asks herself at this point is how she can manipulate 
and/or optimize the delay of a gate. To provide an answer to this question, it is necessary 
to make the parameters governing the delay explicit by expanding R eq in the delay equa- 
tion. Combining Eq. (5.18) and Eq. (5.17), and assuming for the time being that the chan- 
nel-length modulation factor X is ignorable, yields the following expression for t pHL (a 
similar analysis holds for t pLH ) 



t pHL = 0.69 - ClVdd = 0.52 



C,V r 



41 



DSATn 



( W/L) n k n V DSATn (V DD - V Tn - V DSATn /2) 



(5.21) 



In the majority of designs, the supply voltage is chosen high enough so that V DD » V Tn + 
V I)S at,J2. Under these conditions, the delay becomes virtually independent of the supply 
voltage (Eq. (5.22)). Observe that this is a first-order approximation, and that increasing 
the supply voltage yields an observable, albeit small, improvement in performance due to 
a non-zero channel-length modulation factor. 



t nH , = 0.52 ^ 

pHL ( W/L) n k' n V DSATn 



(5.22) 



This analysis is confirmed in Figure 5.17, which plots the propagation delay of the 
inverter as a function of the supply voltage. It comes as no surprise that this curve is virtu- 
ally identical in shape to the one of Figure 3.27, which charts the equivalent on-resistance 
of the MOS transistor as a function of V DD . While the delay is relative insensitive to sup- 
ply variations for higher values of V DD , a sharp increase can be observed starting around 




Figure 5.17 Propagation delay of CMOS 
inverter as a function of supply voltage 
(normalized with respect to the delay at 2.5 
V). The dots indicate the delay values 
predicted by Eq. (5.21). Observe that this 
equation is only valid when the devices are 
velocity-saturated. Hence, the deviation at 
low supply voltages. 



~2V T . This operation region should clearly be avoided if achieving high performance is a 
premier design goal. 



A- 



Design Techniques 







From the above, we deduce that the propagation delay of a gate can be minimized in the fol- 
lowing ways: 
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• Reduce C L . Remember that three major factors contribute to the load capacitance: the 
internal diffusion capacitance of the gate itself, the interconnect capacitance, and the fan- 
out. Careful layout helps to reduce the diffusion and interconnect capacitances. Good 
design practice requires keeping the drain diffusion areas as small as possible. 

• Increase the W/L ratio of the transistors. This is the most powerful and effective perfor- 
mance optimization tool in the hands of the designer. Proceed however with caution 
when applying this approach. Increasing the transistor size also raises the diffusion 
capacitance and hence C L . In fact, once the intrinsic capacitance (i.e. the diffusion capac- 
itance) starts to dominate the extrinsic load formed by wiring and fanout, increasing the 
gate size does not longer help in reducing the delay, and only makes the gate larger in 
area. This effect is called “self-loading” . In addition, wide transistors have a larger gate 
capacitance, which increases the fan-out factor of the driving gate and adversely affects 
its speed. 

• Increase V DD . As illustrated in Figure 5.17, the delay of a gate can be modulated by 
modifying the supply voltage. This flexibility allows the designer to trade-off energy dis- 
sipation for performance, as we will see in a later section. However, increasing the sup- 
ply voltage above a certain level yields only very minimal improvement and hence 
should be avoided. Also, reliability concerns (oxide breakdown, hot-electron effects) 
enforce firm upper-bounds on the supply voltage in deep sub-micron processes. 




Problem 5.4 Propagation Delay as a Function of (dis)charge Current 



So far, we have expressed the propagation delay as a function of the equivalent resistance of 
the transistors. Another approach would be replace the transistor by a current source with 
value equal to the average (dis)charge current over the interval of interest. Derive an expres- 
sion of the propagation delay using this alternative approach. 



5.4.3 Propagation Delay from a Design Perspective 

Some interesting design considerations and trade-off's can be derived from the delay 
expressions we have derived so far. Most importantly, they lead to a general approach 
towards transistor sizing that will prove to be extremely useful. 

NMOS/PMOS Ratio 

So far, we have consistently widened the PMOS transistor so that its resistance matches 
that of the pull-down NMOS device. This typically requires a ratio of 3 to 3.5 between 
PMOS and NMOS width. The motivation behind this approach is to create an inverter 
with a symmetrical VTC, and to equate the high-to-low and low-to-high propagation 
delays. However, this does not imply that this ratio also yields the minimum overall prop- 
agation delay. If symmetry and reduced noise margins are not of prime concern, it is actu- 
ally possible to speed up the inverter by reducing the width of the PMOS device! 
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The reasoning behind this statement is that, while widening the PMOS improves the 
tpLH °f inverter by increasing the charging current, it also degrades the t pHL by cause of 
a larger parasitic capacitance. When two contradictory effects are present, there must exist 
a transistor ratio that optimizes the propagation delay of the inverter. 

This optimum ratio can be derived through the following simple analysis. Consider 
two identical, cascaded CMOS inverters. The load capacitance of the first gate equals 
approximately 

Q, = (Cjpi + C dni ) + (C gp2 + C gn2 ) + C w (5.23) 

where C dp , and C dnl are the equivalent drain diffusion capacitances of PMOS and NMOS 
transistors of the first inverter, while C gp2 and C gn2 are the gate capacitances of the second 
gate. C w represents the wiring capacitance. 

When the PMOS devices are made (3 times larger than the NMOS ones (p = (W/L) p / 
( W/L ) n ), all transistor capacitances will scale in approximately the same way, or C dpI ~ p 
C dnl , and C gp2 ~ p C gn2 . Eq. (5.23) can then be rewritten: 

C L = (1 + P)(C d)ll + C gn2 ) + C w (5.24) 

An expression for the propagation delay can be derived, based on Eq. (5.20). 



t p - ^^((1 + P)(Q„i + C gn2 ) + C w )^R eqn + 

= 0.345((1 + P)(C rfnl + C gn2 ) + C w )R eqn [\ + ^ 



(5.25) 



r (= R eql JR eqn ) represents the resistance ratio of identically-sized PMOS and NMOS tran- 
sistors. The optimal value of p can be found by setting 2? to 0, which yields 





P opt 





(5.26) 



This means that when the wiring capacitance is negligible (C dnl +C gn2 >> C w ), fi opt 
equals Jr, in contrast to the factor r normally used in the noncascaded case. If the wiring 
capacitance dominates, larger values of p should be used. The surprising result of this 
analysis is that smaller device sizes (and hence smaller design area) yield a faster design at 
the expense of symmetry and noise margin. 



Example 5.6 Sizing of CMOS Inverter Loaded by an Identical Gate 

Consider again our standard design example. From the values of the equivalent resistances 
(Table 3.3), we find that a ratio P of 2.4 (= 31 kO / 13 kfl) would yield a symmetrical tran- 
sient response. Eq. (5.26) now predicts that the device ratio for an optimal performance 
should equal 1.6. These results are verified in Figure 5.18, which plots the simulated propaga- 
tion delay as a function of the transistor ratio p. The graph clearly illustrates how a changing 
P trades off between t pLH and t pHL . The optimum point occurs around P = 1.9, which is some- 
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what higher than predicted. Observe also that the rising and falling delays are identical at the 
predicted point of (3 equal to 2.4. 



x 10'” 




Figure 5.18 Propagation delay of CMOS inverter as a 
function of the PMOS/NMOS transistor ratio (3. 






Sizing Inverters for Performance 

In this analysis, we assume a symmetrical inverter, this is an inverter where PMOS and 
NMOS are sized such that the rise and fall delays are identical. The load capacitance of the 
inverter can be divided into an intrinsic and an extrinsic component, or C L = C int + C ext . 
C int represents the self-loading or intrinsic output capacitance of the inverter, and is associ- 
ated with the diffusion capacitances of the NMOS and PMOS transistors as well as the 
gate-drain overlap (Miller) capacitances. C ext is the extrinsic load capacitance, attributable 
to fanout and wiring capacitance. Assuming that R eq stands for the equivalent resistance of 
the gate, we can express the propagation delay as follows 



t p — 0.69 R eq {C int + C ext ) 

= 0.69R ec/ C inl (l + C ext /C int ) = t p0 ( 1 + C ext /C lnt ) 



t p0 = 0.69 R eq C int represents the delay of the inverter only loaded by its own intrinsic 
capacitance (C ext = 0), and is called the intrinsic or unloaded delay. 

The next question is how transistor sizing impacts the performance of the gate. To 
do so, we must establish the relationship between the various parameters in Eq. (5.27) and 
the sizing factor S, which relates the transistor sizes of our inverter to a reference 
gate — typically a minimum-sized inverter. The intrinsic capacitance C int consists of the 
diffusion and Miller capacitances, both of which are proportional to the width of the tran- 
sistors. Hence, C int = SC h . ef . The resistance of the gate relates to the reference gate as R eq = 
R n ,f!S. We can now rewrite Eq. (5.27), 



t p = 0mR re /S)(SC iref Xl+C ex /(SC iref )) 

c„ 



0-69 R ref C ire 



r 

1 + 



SC 



iref' 



= t 



P 0 



1 + 



SC 



iref 



A- 








(5.28) 
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This leads to two important conclusions: 

• The intrinsic delay of the inverter t p0 is independent of the sizing of the gate, and is 
purely determined by technology and inverter layout. When no load is present, an 
increase in the drive of the gate is totally offset by the increased capacitance. 

• Making S infinitely large yields the maximum obtainable performance gain, elimi- 
nating the impact of any external load, and reducing the delay to the intrinsic one. 
Yet, any sizing factor S that is sufficiently larger than ( C e JC int ) produces similar 
results at a substantial gain in silicon area. 



Example 5.7 Device Sizing for Performance 



Let us explore the performance improvement that can be obtained by device sizing in the 
design of Example 5.5. We find from Table 5.2 that C jn JC ext ~ 1.05 (C im = 3.0 fF, C ext = 3.15 
fF). This would predict a maximum performance gain of 2.05. A scaling factor of 10 allows 
us to get within 10% of this optimal performance, while larger device sizes only yield ignor- 
able performance gains. 

This is confirmed by simulation results, which predict a maximum obtainable perfor- 




s 



Figure 5.19 Increasing inverter performance by 
sizing the NMOS and PMOS transistor with an 
identical factor S for a fixed fanout (inverter of 
Figure 5.15). 




mance improvement of 1.9 ( t p0 = 19.3 psec). From the graph of Figure 5.19, we observe that 
the bulk of the improvement is already obtained for 5 = 5, and that sizing factors larger than 
10 barely yield any extra gain. 



Sizing A Chain of Inverters 

While sizing up an inverter reduces its delay, it also increases its input capacitance. Gate 
sizing in an isolated fashion without taking into account its impact on the delay of the pre- 
ceding gates is a purely academic enterprise. Therefore, a more relevant problem is deter- 
mining the optimum sizing of a gate when embedded in a real environment. A simple 
chain of inverters is a good first case to study. To determine the input loading effect, the 
relationship between the input gate capacitance C g and the intrinsic output capacitance of 
the inverter has to be established. Both are proportional to the gate sizing. Hence, the fol- 
lowing relationship holds, independent of gate sizing 
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C int = 7 C g (5.29) 

y is a proportionality factor, which is only a function of technology and is close to 1 for 
most sub-micron processes. Rewriting Eq. (5.28), 

tp = *,o( 1 + §f) = W 1+ //ri (5-30) 

shows that the delay if the inverter is only a function of the ratio between its external load 
capacitance and input capacitance. This ratio is called the effective fanout f. 

Let us consider the circuit of Eq. Figure 5.20. The goal is to minimize the delay 
through the inverter chain, with the input capacitance of the first inverter C gl — typically a 
minimally-sized device — and the load capacitance C L fixed. 






Figure 5.20 Chain of N inverters with fixed 
input and output capacitance. 



Given the delay expression for the j-th inverter stage, 2 

t P.j = f po( 1 + ~^c~] = ?p 0 (1 + f j/y) (5 ' 31) 

we can derive the total delay of the chain. 

N N 

r p = /.tpj = tpoT'. 

7=1 7=1 

This equation has N - 1 unknowns, being C g2 , C g3 , . .., C gN . The minimum delay can be 
found by taking N - 1 partial derivatives, and equating them to 0, or <)t p / dC g ■ = 0. The 
result is a set of constraints, C g j +I /C g j = C gJ /C gJ _ l . In other words, the optimum size of 
each inverter is the geometric mean of its neighbors sizes, 

( ',7 = -A',., i C- / • i • (5-33) 

Overall, this means that each inverter is sized up by the same factor /with respect to the 
preceding gate, has the same effective fanout (/ = f), and hence the same delay. With C g , 
and C L given, we can derive the sizing factor. 



[ 1 + ^ ±1 \ withC s.N + i = c l (5-32) 

V 7 7 



/ = N Jc L /c gA = n Jf 

and the minimum delay through the chain, 

t p = Nt p0 (l+ r i/F/y). 



(5.34) 



; This expression ignores the wiring capacitance, which is a fair assumption for the time being. 











(5.35) 
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,F represents the overall effective fanout of the circuit, and equals C L IC g Observe how the 
relationship between t p and F is a very strong function of the number of stages. As 
expected, the relationship is linear when only 1 stage is present. Introducing a second 
stage turns it into square root, and so on. The obvious question now is how to choose the 
number of stages so that the delay is minimized for a given value of F. 



Choosing the Right Number of Stages in an Inverter Chain 

Evaluation of Eq. (5.35) reveals the trade-off's in choosing the number of stages for a 
given F ( =j N ). When the number of stages is too large, the first component of the equation, 
representing the intrinsic delay of the stages, becomes dominant. If the number of stages is 
too small, the effective fanout of each stage becomes large, and the second component is 
dominant. The optimum value can be found by differentiating the minimum delay expres- 
sion by the number of stages, and setting the result to 0. 



1 N 

or equivalently 

/= e (1 + VJ) 



(5.36) 



This equation only has a closed-form solution for y = 0, this is when the self-loading is 
ignored and the load capacitance only consists of the fanout. Under these simplified condi- 
tions, it is found that the optimal number of stages equals N = Ini F), and the effective 
fanout of each stage is set to /= 2.71828 = e. This optimal buffer design scales consecutive 
stages in an exponential fashion, and is hence called an exponential horn [Mead79]. When 
self-loading is included, Eq. (5.36) can only be solved numerically. The results are plotted 
in Figure 5.21a. For the typical case of y=l, the optimum scaler factor turns out to be close 
to 3.6. Figure 5.21b plots the (normalized) propagation delay of the inverter chain as a 
function of the effective fanout for y = 1. Choosing values of the fanout that are higher 
than the optimum does not impact the delay that much, and reduces the required number 
of buffer stages and the implementation area. A common practice is to select an optimum 
fanout of 4. The use of too many stages (f <f opt ), on the other hand, has a substantial nega- 
tive impact on the delay, and should be avoided. 



Example 5.8 The Impact of Introducing Buffer Stages 

Table 5.3 enumerates the values of t pop Jt p 0 for the unbuffered design, the dual stage, and 
optimized inverter chain for a variety of values of F (for y = 1). Observe the impressive 
speed-up obtained with cascaded inverters when driving very large capacitive loads. 





The above analysis can be extended to not only cover chains of inverters, but also net- 
works of inverters that contain actual fanout, an example of which is shown in Figure 
5.22. We solely have to adjusting the expression for C ext to incorporate the additional 
fanout factors. 
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(a) Optimum effective fanout/ (or inverter (b) Normalized propagation delay ( tJ(t pop ,) 

scaling factor) as a function of the self-loading as a function of the effective fanout/ for y=l . 

factor y in an inverter chain. 



Figure 5.21 Optimizing the number of stages in an inverter chain. 



Table 5.3 t op /t p 0 versus x for various driver configurations. 



F 


Unbuffered 


Two Stage 


Inverter Chain 


10 


11 


8.3 


8.3 


100 


101 


22 


16.5 


1000 


1001 


65 


24.8 


10,000 


10,001 


202 


33.1 



Problem 5.5 Sizing an Inverter Network 



Determine the sizes of the inverters in the circuit of Figure 5.22, such that the delay 
between nodes Out and In is minimized. You may assume that C L = 64 C g j. 
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Figure 5.22 Inverter network, in which each 
gate has a fanout of 4 gates, distributing a single 
input to 16 output signals in a tree-like fashion. 
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Hints: Determine first the ratio’s between the devices that minimize the delay. You should 
find that the following must hold. 



4C 

~C„ 



iLi 



4C 






C 



g.2 



Cr 

C„ 



Finding the actual gate sizes ( C g 3 = 2.52C g 2 = 6.35C„ ,) is a relatively straightforward 
task. Straightforward sizing of the inverter chain, without taking the fanout into account, 
would have led to a sizing factor of 4 instead of 2.52. 





The rise/fall time of the input signal 



All the above expressions were derived under the assumption that the input signal to the 
inverter abruptly changed from 0 to V DD or vice-versa. Only one of the devices is assumed 
to be on during the (dis)charging process. In reality, the input signal changes gradually 
and, temporarily, PMOS and NMOS transistors conduct simultaneously. This affects the 
total current available for (dis)charging and impacts the propagation delay. Figure 5.23 
plots the propagation delay of a minimum-size inverter as a function of the input signal 
slope — as obtained from SPICE. It can be observed that t p increases (approximately) lin- 
early with increasing input slope, once t s > t (t= 0). 




t (sec) 



Figure 5.23 t as a function of the 
input signal slope (10-90% rise or 
fall time) for minimum- size 
inverter with fan-out of a single 
gate. 



While it is possible to derive an analytical expression describing the relationship 
between input signal slope and propagation delay, the result tends to be complex and of 
limited value. From a design perspective, it is more valuable to relate the impact of the 
finite slope on the performance directly to its cause, which is the limited driving capability 
of the preceding gate. If the latter would be infinitely strong, its output slope would be 
zero, and the performance of the gate under examination would be unaffected. The 
strength of this approach is that it realizes that a gate is never designed in isolation, and 
that its performance is both affected by the fanout, and the driving strength of the gate(s) 
feeding into its inputs. This leads to a revised expression for the propagation delay of an 
inverter i in a chain of inverters [Hedenstierna87]: 
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tp = Lep+nCl (5.37) 

Eq. (5.37) states that the propagation delay of inverter i equals the sum of the delay of the 
same gate for a step input (f st ) (i.e. zero input slope) augmented with a fraction of the 
step-input delay of the preceding gate (i-1). The fraction q is an empirical constant, which 
typically has values around 0.25. This expression has the advantage of being very simple, 
while exposing all relationships necessary for global delay computations of complex cir- 
cuits. 






Example 5.9 Delay of Inverter embedded in Network 

Consider for instance the circuit of Figure 5.22. With the aid of Eq. (5.31) and Eq. (5.37), we 
can derive an expression for the delay of the stage-2 inverter, marked by the gray box. 




An analysis of the overall propagation delay in the style of Problem 5.5, leads to the following 
revised sizing requirements for minimum delay, 

4(1 +rpC Jg . 2 = 4(1 + rpC g 3 = 

C g. 1 C g, 2 C g. 3 

or f 2 =/, = 2.47 (assuming r| = 0.25). 







Design Challenge 



It is advantageous to keep the signal rise times smaller than or equal to the gate propagation 
delays. This proves to be true not only for performance, but also for power consumption con- 
siderations as will be discussed later. Keeping the rise and fall times of the signals small and of 
approximately equal values is one of the major challenges in high-performance design, and is 
often called ‘ slope engineering' . 



Problem 5.6 Impact of input slope 

Determine if reducing the supply voltage increases or decreases the influence of the input 
signal slope on the propagation delay. Explain your answer. 



Delay in the Presence of (Long) Interconnect Wires 

The interconnect wire has played a minimal role in our analysis so far. When gates get far- 
ther apart, the wire capacitance and resistance can no longer be ignored, and may even 
dominate the transient response. Earlier delay expressions can be adjusted to accommo- 
date these extra contributions by employing the wire modeling techniques introduced in 
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the previous Chapter. The analysis detailed in Example 4.9 is directly applicable to the 
problem at hand. Consider the circuit of Figure 5.24, where an inverter drives a single 
fanout through a wire of length L. The driver is represented by a single resistance R dr , 
which is the average between R eqn and R eqp . C lnt and C fan account for the intrinsic capaci- 
tance of the driver, and the input capacitance of the fanout gate, respectively. 



->> 



(r w ,c w ,L) 

AA/V 



-t>°- 



Figure 5.24 Inverter driving single fanout through wire of 
length L. 





The propagation delay of the circuit can be obtained by applying the Ellmore delay 
expression. 



t p = 0.69R dr C inl + (0.69R dr + 038R W )C W + 0.69(R dr + R w )C fan 
= 0.69 R dr (C int + C fan ) + 0.69 (R dr c w + r w C fan )L + 0.38 r w c w L 2 



(5.38) 



The 0.38 factor accounts for the fact that the wire represents a distributed delay. C w and R w 
stand for the total capacitance and resistance of the wire, respectively. The delay expres- 
sions contains a component that is linear with the wire length, as well a quadratic one. It is 
the latter that causes the wire delay to rapidly become the dominant factor in the delay 
budget for longer wires. 




Example 5.10 Inverter delay in presence of interconnect 

Consider the circuit of Figure 5.24, and assume the device parameters of Example 5.5: C int = 
3 fF, Cf an = 3 fF, and R dr = 0.5(13/1.5 + 31/4.5) = 7.8 k O . The wire is implemented in metall 
and has a width of 0.4 pm — the minimum allowed. This yields the following parameters: c w = 
92 aF/pm, and r w = 0.19 O/pm (Example 4.4). With the aid of Eq. (5.38), we can compute at 
what wire length the delay of the interconnect becomes equal to the intrinsic delay caused 
purely by device parasitics. Solving the following quadratic equation yields a single (mean- 
ingful) solution. 



6.6 x 10 V + 0.5 x 10 n L = 32.29 x 10 12 
or 

L = 65 pm 

Observe that the extra delay is solely due to the linear factor in the equation, and more specif- 
ically due to the extra capacitance introduced by the wire. The quadratic factor (this is, the 
distributed wire delay) only becomes dominant at much larger wire lengths (> 7 cm). This is 
due to the high resistance of the (minimum-size) driver transistors. A different balance 
emerges when wider transistors are used. Analyze, for instance, the same problem with the 
driver transistors 100 times wider, as is typical for high-speed, large fan-out drivers. 
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5.5 Power, Energy, and Energy-Delay 

So far, we have seen that the static CMOS inverter with its almost ideal VTC — symmetri- 
cal shape, full logic swing, and high noise margins — offers a superior robustness, which 
simplifies the design process considerably and opens the door for design automation. 
Another major attractor for static CMOS is the almost complete absence of power con- 
sumption in steady-state operation mode. It is this combination of robustness and low 
static power that has made static CMOS the technology of choice of most contemporary 
digital designs. The power dissipation of a CMOS circuit is instead dominated by the 
dynamic dissipation resulting from charging and discharging capacitances. 




5.5.1 Dynamic Power Consumption 



Dynamic Dissipation due to Charging and Discharging Capacitances 

Each time the capacitor C L gets charged through the PMOS transistor, its voltage rises 
from 0 to V DD , and a certain amount of energy is drawn from the power supply. Part of this 
energy is dissipated in the PMOS device, while the remainder is stored on the load capac- 
itor. During the high-to-low transition, this capacitor is discharged, and the stored energy 
is dissipated in the NMOS transistor. 3 

A precise measure for this energy consump- 
tion can be derived. Let us first consider the low-to- 
high transition. We assume, initially, that the input 
waveform has zero rise and fall times, or, in other 
words, that the NMOS and PMOS devices are never 
on simultaneously. Therefore, the equivalent circuit 
of Figure 5.25 is valid. The values of the energy 
E vdd , taken from the supply during the transition, as 
well as the energy E c , stored on the capacitor at the 
end of the transition, can be derived by integrating 
the instantaneous power over the period of interest. 

The corresponding waveforms of v out (t) and i VDD (t ) 
are pictured in Figure 5.26. 




Figure 5.25 Equivalent circuit 
during the low-to-high transition. 



J VDD 



-J' 



vDoy 



( 0 - VdD C, 



dv 



dt 



OU 'dt = CrV 

^ l v dd 



dv„ 



- r V 2 

- L v DD 



(5.39) 



3 Observe that this model is a simplification of the actual circuit. In reality, the load capacitance consists 
of multiple components some of which are located between the output node and GND, others between output 
node and Tdd- The latter experience a charge-discharge cycle that is out of phase with the capacitances to GND, 
i.e. they get charged when V out goes low and discharged when V out rises. While this distributes the energy deliv- 
ery by the supply over the two phases, it does not impact the overall dissipation, and the results presented in this 
section are still valid. 
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E 



c 



0 



\ cl ~df v ° u,dt = Cl j v ° utdv ° ul 

0 0 



CjVjh 

2 



(5.40) 




These results can also be derived by observing that during the low-to-high transi- 
tion, C L is loaded with a charge C L V DD . Providing this charge requires an energy from the 
supply equal to C L V DD 2 (= Q x V DD ). The energy stored on the capacitor equals C L V DD 2 /2. 
This means that only half of the energy supplied by the power source is stored on C L . The 
other half has been dissipated by the PMOS transistor. Notice that this energy dissipation 
is independent of the size (and hence the resistance) of the PMOS device! During the dis- 
charge phase, the charge is removed from the capacitor, and its energy is dissipated in the 
NMOS device. Once again, there is no dependence on the size of the device. In summary, 
each switching cycle (consisting of an L— >H and an H— >L transition) takes a fixed amount 
of energy, equal to C L V DD 2 . In order to compute the power consumption, we have to take 
into account how often the device is switched. If the gate is switched on and off/ 0 ^j times 
per second, the power consumption equals 



Pdyn - Cl VdJo -> I (5.41) 

f 0 _^ i represents the frequency of energy-consuming transitions, this is 0 — ^ 1 transitions 
for static CMOS. 

Advances in technology result in ever-higher of values of / 0 ^j (as t decreases). At 
the same time, the total capacitance on the chip (Cj) increases as more and more gates are 
placed on a single die. Consider for instance a 0.25 jim CMOS chip with a clock rate of 
500 Mhz and an average load capacitance of 15 fF/gate, assuming a fanout of 4. The 
power consumption per gate for a 2.5 V supply then equals approximately 50 |iW. For a 
design with 1 million gates and assuming that a transition occurs at every clock edge, this 
would result in a power consumption of 50 W! This evaluation presents, fortunately, a 
pessimistic perspective. In reality, not all gates in the complete IC switch at the full rate of 
500 Mhz. The actual activity in the circuit is substantially lower. 



Example 5.11 Capacitive power dissipation of inverter 

The capacitive dissipation of the CMOS inverter of Example 5.4 is now easily computed. In 
Table 5.2, the value of the load capacitance was determined to equal 6 fF. For a supply volt- 
age of 2.5 V, the amount of energy needed to charge and discharge that capacitance equals 
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Edyn = C,v) m = 37.5 fJ 

Assume that the inverter is switched at the maximum possible rate ( T = 1 //= t pLH + t pHL 
= 2 t ). For a t p of 32.5 psec (Example 5.5), we find that the dynamic power dissipation of the 
circuit is 



P d yn = E dyn /(2t p ) = 580 JlW 

Of course, an inverter in an actual circuit is rarely switched at this maximum rate, and 
even if done so, the output does not swing from rail-to-rail. The power dissipation will hence 
be substantially lower. For a rate of 4 GHz (T = 250 psec), the dissipation reduces to 150 (iW. 
This is confirmed by simulations, which yield a power consumption of 155 (tW. 



Computing the dissipation of a complex circuit is complicated by the /q^ factor, 
also called the switching activity. While the switching activity is easily computed for an 
inverter, it turns out to be far more complex in the case of higher-order gates and circuits. 
One concern is that the switching activity of a network is a function of the nature and the 
statistics of the input signals: If the input signals remain unchanged, no switching hap- 
pens, and the dynamic power consumption is zero! On the other hand, rapidly changing 
signals provoke plenty of switching and hence dissipation. Other factors influencing the 
activity are the overall network topology and the function to be implemented. We can 
accommodate this by another rewrite of the equation, or 

Edyn = C l ^D ofo — > 1 = -> if = C EFF^Dof (5.42) 

where / now presents the maximum possible event rate of the inputs (which is often the 
clock rate) and P,,^ the probability that a clock event results in a 0 — » 1 (or power-con- 
suming) event at the output of the gate. C EFF = P 0 ^ l C L is called the effective capacitance 
and represents the average capacitance switched every clock cycle. For our example, an 
activity factor of 10% (Pq_>i = 0.1) reduces the average consumption to 5 W. 

Example 5.12 Switching activity 

Consider the waveforms on the 
right where the upper waveform 
represents the idealized clock sig- Clock 
nal, and the bottom one shows the 

signal at the output of the gate. 

Power consuming transitions Output signal 
occur 2 out of 8 times, which is 

equivalent to a transition probabil- Figure 5.27 Clock and signal waveforms 

ity of 0.25 (or 25%). 



Low Energy/Power Design Techniques 



With the increasing complexity of the digital integrated circuits, it is anticipated that the power 
problem will only worsen in future technologies. This is one of the reasons that lower supply 
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voltages are becoming more and more attractive. Reducing V in) has a quadratic effect on 
P dy „. For instance, reducing V DD from 2.5 V to 1.25 V for our example drops the power dissipa- 
tion from 5 W to 1.25 W. This assumes that the same clock rate can be sustained. Figure 5.17 
demonstrates that this assumption is not that unrealistic as long as the supply voltage is sub- 
stantially higher than the threshold voltage. An important performance penalty occurs once 
V DD approaches 2 V T . 

When a lower bound on the supply voltage is set by external constraints (as often hap- 
pens in real-world designs), or when the performance degradation due to lowering the supply 
voltage is intolerable, the only means of reducing the dissipation is by lowering the effective 
capacitance. This can be achieved by addressing both of its components: the physical capaci- 
tance and the switching activity. 

A reduction in the switching activity can only be accomplished at the logic and architec- 
tural abstraction levels, and will be discussed in more detail in later Chapters. Lowering the 
physical capacitance is an overall worthwhile goal, which also helps to improve the perfor- 
mance of the circuit. As most of the capacitance in a combinational logic circuit is due to tran- 
sistor capacitances (gate and diffusion), it makes sense to keep those contributions to a 
minimum when designing for low power. This means that transistors should be kept to minimal 
size whenever possible or reasonable. This definitely affects the performance of the circuit, but 
the effect can be offset by using logic or architectural speed-up techniques. The only instances 
where transistors should be sized up is when the load capacitance is dominated by extrinsic 
capacitances (such as fan-out or wiring capacitance). This is contrary to common design prac- 
tices used in cell libraries, where transistors are generally made large to accommodate a range 
of loading and performance requirements. 

The above observations lead to an interesting design challenge. Assume we have to min- 
imize the energy dissipation of a circuit with a specified lower-bound on the performance. An 
attractive approach is to lower the supply voltage as much as possible, and to compensate the 
loss in performance by increasing the transistor sizes. Yet, the latter causes the capacitance to 
increase. It may be foreseen that at a low enough supply voltage, the latter factor may start to 
dominate and cause energy to increase with a further drop in the supply voltage. 

Example 5.13 Transistor Sizing for Energy Minimization 

To analyze the transistor-sizing for mini- 
mum energy problem, we examine the sim- 
ple case of a static CMOS inverter driving an 
external load capacitance C ext . To take the 
input loading effects into account, we 
assume that the inverter itself is driven by a 
minimum-sized device (Figure 5.28). The 
goal is to minimize the energy dissipation of 
the complete circuit, while maintaining a 
lower-bound on performance. The degrees of freedom are the size factor/of the inverter and 
the supply voltage V dd of the circuit. The propagation delay of the optimized circuit should 
not be larger than that of a reference circuit, chosen to have as parameters /= 1 and V dd = V ref . 

Using the approach introduced in Section 5.4.3 ( Sizing a Chain of Inverters), we can 
derive an expression for the propagation delay of the circuit. 




Figure 5.28 CMOS inverter driving an external 
load capacitance C exP while being driven by a 
minimum sized gate. 
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'<' = '4 i+ £H i+ /y)) (543) 

with F = (C M /C ?1 ) the overall effective fanout of the circuit f p o is the intrinsic delay of the 
inverter. Its dependence upon V DD is approximated by the following expression, derived from 
Eq. (5.21). 









V O' 



Vn 



■V T 



( 5 . 44 ) 



The energy dissipation for a single transition at the input is easily found once the total capaci- 
tance of the circuit is known, or 



E = V 2 dd C gl ((l+y))(l+f) + F) ( 5 . 45 ) 

The performance constraint now states that the propagation delay of the scaled circuit should 
be equal (or smaller) to the delay of the reference circuit (/= 1, V dd = V re j). To simplify the sub- 
sequent analysis, we make the simplifying assumption that the intrinsic output capacitance of 
the gate equals its gate capacitance, or y = 1 . Hence, 




tpref t 5 + F) 



f VpD 

yv ref . 



Vdd - V T e ' 



+f+ ^ 

f 

3 + F 



v 



= 1 



( 5 . 46 ) 



Eq. (5.46) establishes a relationship between the sizing factor/and the supply voltage, plotted 
in Figure 5.29a for different values of F. Those curves show a clear minimum. Increasing the 
size of the inverter from the minimum initially increases the performance, and hence allows 
for a lowering of the supply voltage. This is fruitful until the optimum sizing factor of 
/ = , JF is reached, which should not surprise careful readers of the previous sections. Fur- 
ther increases in the device sizes only increase the self-loading factor, deteriorate the perfor- 
mance, and require an increase in supply voltage. Also observe that for the case of F= 1, the 
reference case is the best solution; any resizing just increases the self-loading. 





(a) (b) 

Figure 5.29 Sizing of an inverter for energy-minimization, (a) Required supply voltage as a function of the sizing factor/ 
for different values of the overall effective fanout F; (b) Energy of scaled circuit (normalized with respect to the reference 
case) as a function off. V red = 2.5V, V TE = 0.5V. 



A- 
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With the V DD (f) relationship in hand, we can derive the energy of the scaled circuit 
(normalized with respect to the reference circuit) as a function of the sizing factor/. 




V dd\ 2 ( 2 + 2 /+ F \ 
V ref ){ 4 + F J 



(5.47) 



Finding an analytical expression for the optimal sizing factor is possible, but yields a complex 
and messy equation. A graphical approach is just as effective. The resulting charts are plotted 
in Figure 5.29b, from which a number of conclusions can be drawn: 

• Device sizing, combined with supply voltage reduction, is a very effective approach in 
reducing the energy consumption of a logic network. This is especially true for net- 
works with large effective fanouts, where energy reductions with almost a factor of 10 can 
be observed. But the gain is also sizable for smaller values of F. The only exception is the 
F= 1 case, where the minimum size device is also the most effective one. 



• Oversizing the transistors beyond the optimal value comes at a hefty price in energy. This 
is unfortunately a common approach in many of today’s designs. 

• The optimal sizing factor for energy is smaller than the one for performance, especially for 
large values of F. For example, for a fanout of 20, / op ,(energy) = 3.53, while /^(perfor- 
mance) = 4.47. Increasing the device sizes only leads to a minimal supply reduction once 
V DD starts approaching V TE , hence leading to very minimal energy gains. 





Dissipation Due to Direct-Path Currents 

In actual designs, the assumption of the zero rise and fall times of the input wave forms is 
not correct. The finite slope of the input signal causes a direct current path between V DD 
and GND for a short period of time during switching, while the NMOS and the PMOS 
transistors are conducting simultaneously. This is illustrated in Figure 5.30. Under the 
(reasonable) assumption that the resulting current spikes can be approximated as triangles 
and that the inverter is symmetrical in its rising and falling responses, we can compute the 
energy consumed per switching period, 

E dp = vj - ^ + vj-^ = t sc V DD I peak (5.48) 

as well as the average power consumption 

P dp = t sc V DD I peak f= C sc V DD f (5.49) 

The direct-path power dissipation is proportional to the switching activity, similar to the 
capacitive power dissipation. t sc represents the time both devices are conducting. For a lin- 
ear input slope, this time is reasonably well approximated by Eq. (5.50) where t s repre- 
sents the 0-100% transition time. 
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Figure 5.30 Short-circuit currents during transients. 




t 



sc 



V dp ~ 2 V T _ V DP - 2 V T y 
V DD s v DD 



f r(f) 

0.8 



(5.50) 



I peak is determined by the saturation current of the devices and is hence directly pro- 
portional to the sizes of the transistors. The peak current is also a strong function of the 
ratio between input and output slopes. This relationship is best illustrated by the follow- 
ing simple analysis: Consider a static CMOS inverter with a 0 — > 1 transition at the input. 
Assume first that the load capacitance is very large, so that the output fall time is signifi- 
cantly larger than the input rise time (Figure 5.31a). Under those circumstances, the input 





(a) Large capacitive load (b) Small capacitive load 

Figure 5.31 Impact of load capacitance on short-circuit current. 



moves through the transient region before the output starts to change. As the source-drain 
voltage of the PMOS device is approximately 0 during that period, the device shuts off 
without ever delivering any current. The short-circuit current is close to zero in this case. 
Consider now the reverse case, where the output capacitance is very small, and the output 
fall time is substantially smaller than the input rise time (Figure 5.31b). The drain-source 
voltage of the PMOS device equals V DD for most of the transition period, guaranteeing the 
maximal short-circuit current (equal to the saturation current of the PMOS). This clearly 
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represents the worst-case condition. The conclusions of the above analysis are confirmed 
in Figure 5.32, which plots the short-circuit current through the NMOS transistor during a 
low-to-high transition as a function of the load capacitance. 




time (sec) 



Figure 5.32 CMOS inverter short-circuit current 
through NMOS transistor as a function of the load 
capacitance (for a fixed input slope of 500 psec). 



This analysis leads to the conclusion that the short-circuit dissipation is minimized 
by making the output rise/fall time larger than the input rise/fall time. On the other hand, 
making the output rise/fall time too large slows down the circuit and can cause short-cir- 
cuit currents in the fan-out gates. This presents a perfect example of how local optimiza- 
tion and forgetting the global picture can lead to an inferior solution. 






Design Techniques 






A more practical rule, which optimizes the power consumption in a global way, can be formu- 
lated (Veendrick84]): 

The power dissipation due to short-circuit currents is minimized by matching the rise/fall 
times of the input and output signals. At the overall circuit level, this means that rise/fall 
times of all signals should be kept constant within a range. 

Making the input and output rise times of a gate identical is not the optimum solution for 
that particular gate on its own, but keeps the overall short-circuit current within bounds. This is 
shown in Figure 5.33, which plots the short-circuit energy dissipation of an inverter (normal- 
ized with respect to the zero-input rise time dissipation) as a function of the ratio r between 
input and output rise/fall times. When the load capacitance is too small for a given inverter size 
(r > 2... 3 for V DD = 5 V). the power is dominated by the short-circuit current. For very large 
capacitance values, all power dissipation is devoted to charging and discharging the load 
capacitance. When the rise/fall times of inputs and outputs are equalized, most power dissipa- 
tion is associated with the dynamic power and only a minor fraction (< 10%) is devoted to 
short-circuit currents. 

Observe also that the impact of short-circuit current is reduced when we lower the 
supply voltage, as is apparent from Eq. (5.50). In the extreme case, when V DD < y T n + ivy, 
short-circuit dissipation is completely eliminated, because both devices are never on 
simultaneously. With threshold voltages scaling at a slower rate than the supply voltage, short- 
circuit power dissipation is becoming of a lesser importance in deep-submicron technologies. 
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sin sout 



W/L\ P = 1.125 (j m/0.25 pm 
W/L\ n = 0.375 ji m/0.25 pm 
C L = 30 fF 



Figure 5.33 Power dissipation of a static CMOS 
inverter as a function of the ratio between input 
and output rise/fall times. The power is 
normalized with respect to zero input rise-time 
dissipation. At low values of the slope ratio, input- 
output coupling leads to some extra dissipation. 



At a supply voltage of 2.5 V and thresholds around 0.5 V, an input/output slope ratio of 2 is 
needed to cause a 10% degradation in dissipation. 






Finally, its is worth observing that the short-circuit power dissipation can be mod- 
eled by adding a load capacitance C sc = t sc I peak JV D D in parallel with C L , as is apparent in 
Eq. (5.49). The value of this short-circuit capacitance is a function of V DDl the transistor 
sizes, and the input-output slope ratio. 

5.5.2 Static Consumption 

The static (or steady-state) power dissipation of a circuit is expressed by Eq. (5.51), where 
I smt is the current that flows between the supply rails in the absence of switching activity 

Pstat = htat^DD (5.51) 

Ideally, the static current of the CMOS inverter is equal to zero, as the PMOS and 
NMOS devices are never on simultaneously in steady-state operation. There is, unfortu- 
nately, a leakage cuitc nt flowing through the reverse-biased diode junctions of the transis- 
tors, located between the source or drain and the substrate as shown in Figure 5.34. This 
contribution is, in general, very small and can be ignored. For the device sizes under con- 
sideration, the leakage current per unit drain area typically ranges between 10-100 
pA/pm 2 at room temperature. For a die with 1 million gates, each with a drain area of 0.5 
pm 2 and operated at a supply voltage of 2.5 V, the worst-case power consumption due to 
diode leakage equals 0.125 mW, which is clearly not much of an issue. 

However, be aware that the junction leakage currents are caused by thermally gener- 
ated carriers. Their value increases with increasing junction temperature, and this occurs 
in an exponential fashion. At 85°C (a common junction temperature limit for commercial 
hardware), the leakage currents increase by a factor of 60 over their room-temperature val- 
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Figure 5.34 Sources of leakage currents in 
CMOS inverter (for V in = 0 V). 



ues. Keeping the overall operation temperature of a circuit low is consequently a desirable 
goal. As the temperature is a strong function of the dissipated heat and its removal mecha- 
nisms, this can only be accomplished by limiting the power dissipation of the circuit 
and/or by using chip packages that support efficient heat removal. 

An emerging source of leakage current is the subthreshold current of the transistors. 
As discussed in Chapter 3, an MOS transistor can experience a drain-source current, even 
when V cs is smaller than the threshold voltage (Figure 5.35). The closer the threshold 
voltage is to zero volts, the larger the leakage current at V GS = 0 V and the larger the static 
power consumption. To offset this effect, the threshold voltage of the device has generally 
been kept high enough. Standard processes feature V T values that are never smaller than 
0.5-0.6V and that in some cases are even substantially higher (~ 0.75 V). 

This approach is being challenged by the reduction in supply voltages that typically 
goes with deep-submicron technology scaling as became apparent in Figure 3.40. We con- 
cluded earlier (Figure 5.17) that scaling the supply voltages while keeping the threshold 
voltage constant results in an important loss in performance, especially when V DD 
approaches 2 V T . One approach to address this performance issue is to scale the device 
thresholds down as well. This moves the curve of Figure 5.17 to the left, which means that 
the performance penalty for lowering the supply voltage is reduced. Unfortunately, the 
threshold voltages are lower-bounded by the amount of allowable subthreshold leakage 
current, as demonstrated in Figure 5.35. The choice of the threshold voltage hence repre- 
sents a trade-off between performance and static power dissipation. The continued scaling 
of the supply voltage predicted for the next generations of CMOS technologies however 
forces the threshold voltages ever downwards, and makes subthreshold conduction a dom- 
inant source of power dissipation. Process technologies that contain devices with sharper 
turn-off characteristic will therefore become more attractive. An example of the latter is 
the SOI (Silicon-on-Insulator) technology whose MOS transistors have slope-factors that 
are close to the ideal 60 mV/decade. 




Example 5.14 Impact of threshold reduction on performance and static power dissipation 

Consider a minimum size NMOS transistor in the 0.25 pm CMOS technology. In Chapter 3, 
we derived that the slope factor S for this device equals 90 mV/decade. The off-current (at 
V C5 = 0) of the transistor fora V T of approximately 0.5 V equals 10 _11 A (Figure 3.22). Reduc- 
ing the threshold with 200 mV to 0.3 V multiplies the off-current of the transistors with a fac- 
tor of 170! Assuming a million gate design with a supply voltage of 1.5 V, this translates into 
a static power dissipation of 10 6 xl70xl0 _1 'xl.5 = 2.6 mW. A further reduction of the thresh- 
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old to 100 mV results in an unacceptable dissipation of almost 0.5 W! At that supply voltage, 
the threshold reductions correspond to a performance improvement of 25% and 40%, respec- 
tively. 



This lower bound on the thresholds is in some sense artificial. The idea that the leak- 
age current in a static CMOS circuit has to be zero is a preconception. Certainly, the pres- 
ence of leakage currents degrades the noise margins, because the logic levels are no longer 
equal to the supply rails. As long as the noise margins are within range, this is not a com- 
pelling issue. The leakage currents, of course, cause an increase in static power dissipa- 
tion. This is offset by the drop in supply voltage, that is enabled by the reduced thresholds 
at no cost in performance, and results in a quadratic reduction in dynamic power. For a 
0.25 jtm CMOS process, the following circuit configurations obtain the same perfor- 
mance: 3 V supply-0.7 V Vf, and 0.45 V supply-0. 1 V V T . The dynamic power consump- 
tion of the latter is, however, 45 times smaller [Liu93] ! Choosing the correct values of 
supply and threshold voltages once again requires a trade-off. The optimal operation point 
depends upon the activity of the circuit. In the presence of a sizable static power dissipa- 
tion, it is essential that non-active modules are powered down , lest static power dissipation 
would become dominant. Power-down (also called standby ) can be accomplished by dis- 
connecting the unit from the supply rails, or by lowering the supply voltage. 

5.5.3 Putting It All Together 

The total power consumption of the CMOS inverter is now expressed as the sum of its 
three components: 

Pfot = P dyn + P dp + P Stat = ^ ' lYdD + DD^ peak 1 s)/o -4 1 + DD^ leak (5.52) 

In typical CMOS circuits, the capacitive dissipation is by far the dominant factor. The 
direct-path consumption can be kept within bounds by careful design, and should hence 
not be an issue. Leakage is ignorable at present, but this might change in the not too dis- 
tant future. 
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The Power-Delay Product, or Energy per Operation 

In Chapter 1, we introduced the power-delay product (PDP) as a quality measure for a 
logic gate. 



PDP = P av t p (5.53) 

The PDP presents a measure of energy, as is apparent from the units (Wsec = Joule). 
Assuming that the gate is switched at its maximum possible rate of f max = 1/(2 t p ), and 
ignoring the contributions of the static and direct-path currents to the power consumption, 
we find 



PDP = C L V 2 DI / max t p = (5.54) 

The PDP stands for the average energy consumed per switching event (this is, for a 
0— >1, or a 1— >0 transition). Remember that earlier we had defined E av as the average 
energy per switching cycle (or per energy-consuming event). As each inverter cycle con- 
tains a 0— >1, and a 1— >0 transition, E av hence is twice the PDP. 

Energy-Delay Product 

The validity of the PDP as a quality metric for a process technology or gate topology is 
questionable. It measures the energy needed to switch the gate, which is an important 
property for sure. Yet for a given structure, this number can be made arbitrarily low by 
reducing the supply voltage. From this perspective, the optimum voltage to run the circuit 
at would be the lowest possible value that still ensures functionality. This comes at the 
major expense in performance, at discussed earlier. A more relevant metric should com- 
bine a measure of performance and energy. The energy-delay product (EDP) does exactly 
that. 



C' 1 / 

EDP = PDP x t = P t 2 = L DD t n (5.55) 

P av P 2 P v 7 

It is worth analyzing the voltage dependence of the EDP. Higher supply voltages reduce 
delay, but harm the energy, and the opposite is true for low voltages. An optimum opera- 
tion point should hence exist. Assuming that NMOS and PMOS transistors have compara- 
ble threshold and saturation voltages, we can simplify the propagation delay expression 
Eq. (5.21). 



a C L V DD 

VdD ~^Te 



(5.56) 



where V Te = V T + V DSAI /2, and a technology parameter. Combining Eq. (5.55) and Eq. 
(5.56), 4 



4 This equation is only accurate as long as the devices remain in velocity saturation, which is probably 
not the case for the lower supply voltages. This introduces some inaccuracy in the analysis, but will not distort 
the overall result. 
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EDP 



a C L V DD 

2(V D d ~ Vte) 



(5.57) 



The optimum supply voltage can be obtained by taking the derivative of Eq. (5.57) with 
respect to V DD , and equating the result to 0. 



Vddop , = \Vte (5.58) 

The remarkable outcome from this analysis is the low value of the supply voltage 
that simultaneously optimizes performance and energy. For sub-micron technologies with 
thresholds in the range of 0.5 V, the optimum supply is situated around 1 V. 



Example 5.15 Optimum supply voltage for 0.25 pm CMOS inverter 

From the technology parameters for our generic CMOS process presented in Chapter 3, the 
value of V TE can be derived. 

V Tn = 0.43 V, V Dsatn = 0.63 V, V TEn = 0.74 V. 

V Tp = -0.4 V, V Dsatp = -1 V, V TEp = -0.9 V. 

V TE = (V TEn +\V TEp \)/2 = 0.8V 

Hence, V DDopt = (3/2) X 0.8 V = 1.2 V. The simulated graphs of Figure 5.36, plotting normal- 
ized delay, energy, and energy-delay product, confirm this result. The optimum supply volt- 
age is predicted to equal 1.1 V. The charts clearly illustrate the trade-off between delay and 
energy. 




V (V) 
DD V ' 



Figure 5.36 Normalized delay, energy, 
and energy-delay plots for CMOS inverter in 
0.25 |im CMOS technology. 



A- 



WARNING: While the above example demonstrates that there exists a supply voltage 
that minimizes the energy-delay product of a gate, this voltage does not necessarily repre- 
sent the optimum voltage for a given design problem. For instance, some designs require a 
minimum performance, which requires a higher voltage at the expense of energy. Simi- 
larly, a lower-energy design is possible by operating by circuit at a lower voltage and by 
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obtaining the overall system performance through the use of architectural techniques such 
as pipelining or concurrency. 





5.5.4 Analyzing Power Consumption Using SPICE 

A definition of the average power consumption of a circuit was provided in Chapter 1, and 
is repeated here for the sake of convenience. 

T T 

P av = j.jpO)dt = ji DD {t)dt (5.59) 

o o 

with T the period of interest, and V DD and i DD the supply voltage and current, respectively. 
Some implementations of SPICE provide built-in functions to measure the average value 
of a circuit signal. For instance, the HSPICE .MEASURE TRAN I(VDD) AVG command 
computes the area under a computed transient response ( I(VDD )) and divides it by the 
period of interest. This is identical to the definition given in Eq. (5.59). Other implementa- 
tions of SPICE are, unfortunately, not as extensive. This is not as bad as it seems, as long 
as one realizes that SPICE is actually a differential equation solver. A small circuit can 
easily be conceived that acts as an integrator and whose output signal is nothing but the 
average power. 

Consider, for instance, the circuit of Figure 5.37. The current delivered by the power 
supply is measured by the current-controlled current source and integrated on the capaci- 
tor C. The resistance R is only provided for DC-convergence reasons and should be cho- 
sen as high as possible to minimize leakage. A clever choice of the element parameter 
ensures that the output voltage P av equals the average power consumption. The operation 
of the circuit is summarized in Eq. (5.60) under the assumption that the initial voltage on 
the capacitor C is zero. 



C—‘ 
d t 



ki 



DD 



or 

T 

P civ = ' l DD^ 

0 



(5.60) 



Equating Eq. (5.59) and Eq. (5.60) yields the necessary conditions for the equivalent 
circuit parameters: k/C = V DD /T. Under these circumstances, the equivalent circuit shown 
presents a convenient means of tracking the average power in a digital circuit. 



Example 5.16 Average Power of Inverter 

The average power consumption of the inverter of Example 5.4 is analyzed using the above 
technique for a toggle period of 250 psec (T = 250 psec, k= 1, V DD = 2.5 V, hence C = 100 
pF). The resulting power consumption is plotted in Figure 5.38, showing an average power 
consumption of approximately 157.3 |J.W. The .MEAS AVG command yields a value of 
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Figure 5.37 Equivalent circuit to measure average power in SPICE. 

160.32 gW, which demonstrates the approximate equivalence of both methods. These num- 
bers are equivalent to an energy of 39 fJ (which is close to the 37.5 f’J derived in Example 
5.11). Observe the slightly negative dip during the high-to-low transition. This is due to the 
injection of current into the supply, when the output briefly overshoots V DD as a result of the 
capacitive coupling between input and output (as is apparent from in the transient response of 
Figure 5.16). 







Average Power 
(over one cycle) 



Figure 5.38 Deriving the power 
consumption using SPICE. 



A- 



5.6 Perspective: Technology Scaling and its Impact on the Inverter 
Metrics 

In section 3.5, we have explored the impact of the scaling of technology on the some of 
the important design parameters such as area, delay, and power. For the sake of clarity, we 
repeat here some of the most important entries in the resulting scaling table (Table 3.8). 



Table 5.4 Scaling scenarios for short-channel devices ( S and U represent the technology and voltage 
scaling parameters, respectively). 



Parameter 


Relation 


Full Scaling 


General Scaling 


Fixed-Voltage Scaling 


Area / Device 


WL 


1/S 2 


1/S 2 


1/S 2 


Intrinsic Delay 


^on Cgate 


1/S 


1/S 


1/S 
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Table 5.4 Scaling scenarios for short-channel devices ( S and U represent the technology and voltage 
scaling parameters, respectively). 



Parameter 


Relation 


Full Scaling 


General Scaling 


Fixed-Voltage Scaling 


Intrinsic Energy 


C v - 

^ gate v 


1/S 3 


1/S U 2 


1/S 


Intrinsic Power 


Energy/Delay 


1/S 2 


MU 2 


1 


Power Density 


P/Area 


1 


s 2 /u 2 


s 2 




Figure 5.39 Scaling of the gate delay (from 
[Dally98]). 



To validity of these theoretical projec- 
tions can be verified by looking back and 
observing the trends during the past decades. 

From Figure 5.39, we can derive that the gate 
delay indeed decreases exponentially at a rate 
of 13%/year, or halving every five years. This 
rate is on course with the prediction of Table 
5.4, since S averages approximately 1.15 as 
we had already observed in Figure 3.39. The 
delay of a 2-input NAND gate with a fanout of 
four has gone from tens of nanoseconds in the 
1960s to a tenth of a nanosecond in the year 
2000, and is projected to be a few tens of pico- 
seconds by 2010. 

Reducing power dissipation has only been a second-order priority until recently. 
Hence, statistics on dissipation-per-gate or design are only marginally available. An inter- 
esting chart is shown in Figure 5.40, which plots the power density measured over a large 
number of designs produced between 1980 and 1995. Although the variation is 
large — even for a fixed technology — it shows the power density to increase approximately 
with S 2 . This is in correspondence with the fixed-voltage scaling scenario presented in 
Table 5.4. For more recent years, we expect a scenario more in line with the full-scaling 
model — which predicts a constant power density — due to the accelerated supply-voltage 
scaling and the increased attention to power-reducing design techniques. Even under these 
circumstances, power dissipation-per-chip will continue to increase due to the ever-larger 
die sizes. 

The presented scaling model has one fatal flaw however: the performance and 
power predictions produce purely “intrinsic” numbers that take only device parameters 
into account. In Chapter 4, it was concluded that the interconnect wires exhibit a different 
scaling behavior, and that wire parasitics may come to dominate the overall performance. 
Similarly, charging and discharging the wire capacitances may dominate the energy bud- 
get. To get a crisper perspective, one has to construct a combined model that considers 
device and wire scaling models simultaneously. The impact of the wire capacitance and its 
scaling behavior is summarized in Table 5.5. We adopt the fixed-resistance model intro- 
duced in Chapter 4. We furthermore assume that the resistance of the driver dominates the 
wire resistance, which is definitely the case for short to medium-long wires. 
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Figure 5.40 Evolution of power-density in 
micro- and DSP processors, as a function of 
the scaling factor S ([Sakurai97]). S is 
normalized to 1 for a 4 pm process. 



Table 5.5 Scaling scenarios for wire capacitance. S and U represent the technology and voltage scaling 
parameters, respectively, while S L stands for the wire-length scaling factor. £ c represents the impact of fringing 
and inter- wire capacitances. 



Parameter 


Relation 


General Sealing 


Wire Capacitance 


WUt 


e/S t 


Wire Delay 


R on^int 


E / S L 


Wire Energy 


c V 2 

' ~ / int v 


tJS L U 2 


Wire Delay / Intrinsic Delay 




e c S/S L 


Wire Energy / Intrinsic Energy 




e c S/S L 



The model predicts that the interconnect-caused delay (and energy) gain in impor- 
tance with the scaling of technology. This impact is limited to an increase with £ c for short 
wires (5 = S L ), but it becomes increasingly more outspoken for medium-range and long 
wires ( S L < S). These conclusions have been confirmed by a number of studies, an exam- 
ple of which is shown in Figure 5.41. How the ratio of wire over intrinsic contributions 
will actually evolve is debatable, as it depends upon a wide range of independent parame- 
ters such as system architecture, design methodology, transistor sizing, and interconnect 
materials. The doom-day scenario that interconnect may cause CMOS performance to sat- 
urate in the very near future hence may be exaggerated. Yet, it is clear to that increased 
attention to interconnect is an absolute necessity, and may change the way the next-gener- 
ation circuits are designed and optimized (e.g. [Sylvester99]). 



chapter5.fm Page 226 Friday, January 18, 2002 9:01 AM 



226 



THE CMOS INVERTER Chapter 5 







* 


120 




10Q 


* \ *,SUQt,'*ia, 


s » 






Gals do \ 


J 


■ • 


5 S3 


■ * 




■ 


40 


■ 




, Wr«<*a r «. 


a 


' • | 




"J -f— J— f | I 




250 ISO 150 130 100 70 50 




Tedrciogr Min Future (m) 

1 



Figure 5.41 Evolution of wire delay / gate delay ratio 
with respect to technology (from [Fisher98]). 



5.7 Summary 

This chapter presented a rigorous and in-depth analysis of the static CMOS inverter. The 
key characteristics of the gate are summarized: 

• The static CMOS inverter combines a pull-up PMOS section with a pull-down 
NMOS device. The PMOS is normally made wider than the NMOS due to its infe- 
rior current-driving capabilities. 

• The gate has an almost ideal voltage-transfer characteristic. The logic swing is equal 
to the supply voltage and is not a function of the transistor sizes. The noise margins 
of a symmetrical inverter (where PMOS and NMOS transistor have equal current- 
driving strength) approach V DD /2. The steady-state response is not affected by fan- 
out. 

• Its propagation delay is dominated by the time it takes to charge or discharge the 
load capacitor C L . To a first order, it can be approximated as 

t p = 0.69 

Keeping the load capacitance small is the most effective means of implementing 
high-performance circuits. Transistor sizing may help to improve performance as 
long as the delay is dominated by the extrinsic (or load) capacitance of fanout and 
wiring. 

• The power dissipation is dominated by the dynamic power consumed in charging 
and discharging the load capacitor. It is given by P 0 _^i C L V DD 2 f. The dissipation is 
proportional to the activity in the network. The dissipation due to the direct-path 
currents occurring during switching can be limited by careful tailoring of the signal 
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slopes. The static dissipation can usually be ignored but might become a major fac- 
tor in the future as a result of subthreshold currents. 

• Scaling the technology is an effective means of reducing the area, propagation delay 
and power consumption of a gate. The impact is even more striking if the supply 
voltage is scaled simultaneously. 

• The interconnect component is gradually taking a larger fraction of the delay and 
performance budget. 




5.8 To Probe Further 

The operation of the CMOS inverter has been the topic of numerous publications and text- 
books. Virtually every book on digital design devotes a substantial number of pages to the 
analysis of the basic inverter gate. An extensive list of references was presented in Chapter 
1. Some references of particular interest that were explicitly quoted in this chapter are 
given below. 
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5.9 Exercises and Design Problems 

For all problems, use the device parameters provided in Chapter 3 (as well as the inside back cover), 
unless otherwise mentioned. 




DESIGN PROBLEM 

Using the 1.2 pm CMOS introduced in Chapter 2, design a static CMOS 
inverter that meets the following requirements: 

1. Matched pull-up and pull-down times (i.e., t HL = t pLH ). 

2. fp = 5 nsec (± 0.1 nsec). 

The load capacitance connected to the output is equal to 4 pF. Notice that this 
capacitance is substantially larger than the internal capacitances of the gate. 

Determine the W and L of the transistors. To reduce the parasitics, use 
minimal lengths (L= 1.2 pm) for all transistors. Verify and optimize the design 
using SPICE after proposing a first design using manual computations. Com- 
pute also the energy consumed per transition. If you have a layout editor (such 
as MAGIC) available, perform the physical design, extract the real circuit 
parameters, and compare the simulated results with the ones obtained earlier. 
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5.1 Exercises and Design Problems 

1. [M, SPICE, 3.3.2J The layout of a static CMOS inverter is given in Figure 5.1. (X = 0.125 

pm). 

a. Determine the sizes of the NMOS and PMOS transistors. 

b. Plot the VTC (using HSPICE) and derive its parameters ( V OH , V OL , V M , V IH , and V IL ). 

c. Is the VTC affected when the output of the gates is connected to the inputs of 4 similar 
gates?. 




2.5 V. 



PMOS 



d. Resize the inverter to achieve a switching threshold of approximately 0.75 V. Do not lay- 
out the new inverter, use HSPICE for your simulations. How are the noise margins 
affected by this modification? 

2. Figure 5.2 shows a piecewise linear approximation for the VTC. The transition region is 
approximated by a straight line with a slope equal to the inverter gain at V The intersection 
of this line with the VOH and the VOL lines defines V,H and VlL- 

a. The noise margins of a CMOS inverter are highly dependent on the sizing ratio, r = k p /k n , 
of the NMOS and PMOS transistors. Use HSPICE with V Tn = \V Tp \ to determine the value 
of r that results in equal noise margins? Give a qualitative explanation. 

b. Section 5.3.2 of the text uses this piecewise linear approximation to derive simplified 
expressions for NM H and NM L in terms of the inverter gain. The derivation of the gain is 
based on the assumption that both the NMOS and the PMOS devices are velocity saturated 
at Vm . For what range of r is this assumption valid? What is the resulting range of 

c. Derive expressions for the inverter gain at Vm for the cases when the sizing ratio is just 
above and just below the limits of the range where both devices are velocity saturated. 
What are the operating regions of the NMOS and the PMOS for each case? Consider the 
effect of channel-length modulation by using the following expression for the small-signal 
resistance in the saturation region: r osal = 1 /(XI D ). 
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Figure 5.2 A different approach to derive 
V, L and V IH . 



3. [M, SPICE, 3.3.2J Figure 5.3 shows an NMOS inverter with resistive load. 

a. Qualitatively discuss why this circuit behaves as an inverter. 

b. Find V 0H and V 0L calculate V IH and V u . 

c. Find NM l and NM H , and plot the VTC using HSP1CE. 

d. Compute the average power dissipation for: (i) V in = 0 V and (ii) V in = 2.5 V 




e. Use HSPICE to sketch the VTCs for R L = 37k, 75k, and 150k on a single graph. 

f. Comment on the relationship between the critical VTC voltages (i.e., V 0L , V OH , V IL , V IH ) 
and the load resistance, R L . 

g. Do high or low impedance loads seem to produce more ideal inverter characteristics? 

4. [E, None, 3.3.3] For the inverter of Figure 5.3 and an output load of 3 pF: 

a. Calculate t plh , t phl , and t p . 

b. Are the rising and falling delays equal? Why or why not? 

c. Compute the static and dynamic power dissipation assuming the gate is clocked as fast as 
possible. 

5. The next figure shows two implementations of MOS inverters. The first inverter uses only 
NMOS transistors. 
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a. Calculate V 0H , V 0L , V M for each case. 



V dd =2.5V 



V DD = 2.5V 




M 4 W/L=0.75/0.25 
j ^OUT 



M 3 W/L=0.375/0.25 



V 



b. Use HSPICE to obtain the two VTCs. You must assume certain values for the source/drain 
areas and perimeters since there is no layout. For our scalable CMOS process, X = 0.125 
(im, and the source/drain extensions are 5A, for the PMOS; for the NMOS the source/drain 
contact regions are 5^.x5A,. 

c. Find V IH , V IL , NM L and NM H for each inverter and comment on the results. How can you 
increase the noise margins and reduce the undefined region? 

d. Comment on the differences in the VTCs, robustness and regeneration of each inverter. 

6. Consider the following NMOS inverter. Assume that the bulk terminals of all NMOS device 
are connected to GND. Assume that the input IN has a 0V to 2.5V swing. 



V DD = 2.5V 




a. Set up the equation(s) to compute the voltage on node x. Assume y=0.5. 

b. What are the modes of operation of device M2? Assume y=0. 

c. What is the value on the output node OUT for the case when IN =0V?Assume y=0. 

d. Assuming y=0, derive an expression for the switching threshold (V M ) of the inverter. 
Recall that the switching threshold is the point where V IN = V 0UT . Assume that the device 
sizes for Ml , M2 and M3 are (W/L),, (W/L) 2 , and (W/L) 3 respectively. What are the limits 
on the switching threshold? 

For this, consider two cases: 
i) (W/L), » (W/L) 2 
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ii) (W/L) 2 » (W/L), 

7 . Consider the circuit in Figure 5.5. Device Ml is a standard NMOS device. Device M2 has all 
the same properties as Ml, except that its device threshold voltage is negative and has a value 
of -0.4V. Assume that all the current equations and inequality equations (to determine the 
mode of operation) for the depletion device M2 are the same as a regular NMOS. Assume that 
the input fiVhas a 0V to 2.5V swing. 



V DD = 2.5 V 



|jVI2 (2(l m/ 1 urn), V Tn = -0.4V 
■ OUT 



IN — N Ml (4(im/l(im) 






Figure 5.5 A depletion load NMOS inverter 



a. Device M2 has its gate terminal connected to its source terminal. If Vin = 0V, what is the 
output voltage? In steady state, what is the mode of operation of device M2 for this input? 

b. Compute the output voltage for V IN = 2.5V. You may assume that V 0UT is small to simplify 
your calculation. In steady state, what is the mode of operation of device M2 for this 
input? 

c. Assuming Pr (/A , =0) = 0.3, what is the static power dissipation of this circuit? 

8. [M, None, 3.3.3] An NMOS transistor is used to charge a large capacitor, as shown in Figure 

5.6. 



a. Determine the t pLH of this circuit, assuming an ideal step from 0 to 2.5V at the input node. 

b. Assume that a resistor R s of 5 kD is used to discharge the capacitance to ground. Deter- 
mine t pHL . 

c. Determine how much energy is taken from the supply during the charging of the capacitor. 
How much of this is dissipated in Ml . How much is dissipated in the pull-down resistance 
during discharge? How does this change when R s is reduced to 1 k£l 

d. The NMOS transistor is replaced by a PMOS device, sized so that k p is equal to the k n of 
the original NMOS. Will the resulting structure be faster? Explain why or why not. 



In 



iX 

20 

— Ml 



Out 



C i= 5pF 



Figure 5.6 Circuit diagram with annotated W/L ratios 



9. The circuit in Figure 5.7 is known as the source follower configuration. It achieves a DC level 
shift between the input and the output. The value of this shift is determined by the current I 0 . 
Assume x d =0, y=0.4, 2l(|) f l=0.6V, V m =0 ,43V, k„’=115(lA/V 2 and ?i=0. 
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V: 






Ml lum/0.25um 
V„ 



V bia = 0.55V 



M2 L D =lum 



V DD = 2.5 V 




(a) 

Figure 5.7 NMOS source follower configuration 



(b) 



a. Suppose we want the nominal level shift between V ; and V 0 to be 0.6V in the circuit in 
Figure 5.7 (a). Neglecting the backgate effect, calculate the width of M2 to provide this 
level shift (Hint: first relate V ; to V 0 in terms of I 0 ). 

b. Now assume that an ideal current source replaces M2 (Figure 5.7 (b)). The NMOS transis- 
tor Ml experiences a shift in V T due to the backgate effect. Find V T as a function of V 0 for 
V 0 ranging from 0 to 2.5V with 0.5V intervals. Plot V T vs. V 0 

c. Plot V 0 vs. Vj as V 0 varies from 0 to 2.5V with 0.5 V intervals. Plot two curves: one 
neglecting the body effect and one accounting for it. How does the body effect influence 
the operation of the level converter? 

d. At V 0 (with body effect) = 2.5V, find V 0 (ideal) and thus determine the maximum error 
introduced by the body effect. 

10 . For this problem assume: 

V DD = 2.5V, Wp/L = 1.25/0.25, W N /L = 0.375/0.25, L=L eff = 0.25pm (i.e. x d = 0pm), C L =C inv . 
gate , V = 115pA/V 2 , k p ’= -30pA/V 2 , V m0 = I V lp0 I = 0.4V, X = 0V 1 , y = 0.4, 2I<|) / 1=0.6V, and t ox 
= 58 A. Use the HSPICE model parameters for parasitic capacitance given below (i.e. C gd0 , Cj, 
C ;JH ,), and assume that VsB =0V for all problems except part (e). 
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V dd =2.5V 




Figure 5.8 CMOS inverter with capacitive 



## Parasitic Capacitance Parameters (F/m)## 

NMOS: CGDO=3.11xlO' ll> , CGSO=3.11xlO' 10 , CJ=2.02xl0' 3 , CJSW=2.75xlO' 10 

PMOS: CGDO=2.68xlO' 10 , CGSO=2.68xlO' 10 , CJ=1.93xlO' 3 , CJSW=2.23xlO' 10 

a. What is the V m for this inverter? 

b. What is the effective load capacitance C Le ^ oi this inverter? (include parasitic capacitance, 
refer to the text for K„„ and m.) Hint: You must assume certain values for the source/drain 
areas and perimeters since there is no layout. For our scalable CMOS process, X = 0.125 
(t.m, and the source/drain extensions are 5A, for the PMOS; for the NMOS the source/drain 
contact regions are 5^x5A,. 

c. Calculate t PHL , t PLH assuming the result of (b) is l ’C Le g = 6.5fF\ (Assume an ideal step 
input, i.e. t rise =tf aU = 0. Do this part by computing the average current used to charge/dis- 
charge C Leff .) 

d. Find (W p AV n ) such that t PHL = t PLH . 

e. Suppose we increase the width of the transistors to reduce the t PHL , t PLH . Do we get a pro- 
portional decrease in the delay times? Justify your answer. 

f. Suppose Vsb = IV, what is the value of V m , V tp , V m 7 How does this qualitatively affect 
C Lefp- 

11 . Using Hspice answer the following questions. 

a. Simulate the circuit in Problem 10 and measure t p and the average power for input V in : 
pulse(0 V DD 5n O.ln O.ln 9n 20n), as V DD varies from IV - 2.5V with a 0.25V interval. [r P 
= (t PHL + t PLH ) / 2J. Using this data, plot ‘t P vs. V DD \ and ‘Power vs. V DD ’. 

Specify AS, AD, PS, PD in your spice deck, and manually add C L = 6.5fF. Set V SB = 0V 
for this problem. 

b. For Vdd equal to 2.5V determine the maximum fan-out of identical inverters this gate can 
drive before its delay becomes larger than 2 ns. 

c. Simulate the same circuit for a set of ‘pulse’ inputs with rise and fall times of t jn rise f a u 
=lns, 2ns, 5ns, 10ns, 20ns. For each input, measure (1 j the rise and fall times t oul rise and 
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t out fail °f the inverter output, (2) the total energy lost E total , and (3) the energy lost due to 
short circuit current E short . 

Using this data, prepare a plot of (1) {t OM rise +t OM > „)/2 vs. t in risefall , (2) E tota , vs. 
^ in_rise,fall> (3) 

^ short hn_rise,fall and (4) E short !E totaI \ s. 

hn_rise,fall. 

d. Provide simple explanations for: 

(i) Why the slope for (1) is less than 1? 

(ii) Why E short increases with hn_rise,falP‘ 

(iii) wh y E totai increases with hn_rise,falP 
12 . Consider the low swing driver of Figure 5.9: 




a. What is the voltage swing on the output node (V out )? Assume y=0. 

b. Estimate (i) the energy drawn from the supply and (ii) energy dissipated for a OV to 2.5V 
transition at the input. Assume that the rise and fall times at the input are 0. Repeat the 
analysis for a 2.5V to 0V transition at the input. 

c. Compute t pLH (i.e. the time to transition from V 0L to (V 0H + V 0L )/2). Assume the input 
rise time to be 0. V 0L is the output voltage with the input at 0V and V 0H is the output volt- 
age with the input at 2.5V. 

d. Compute V 0H taking into account body effect. Assume y = 0.5V 1/2 for both NMOS and 
PMOS. 

13 . Consider the following low swing driver consisting of NMOS devices Ml and M2. Assume 
an NWELL implementation. Assume that the inputs IN and IN have a 0V to 2.5V swing and 
that V IN = 0V when Vjjj = 2.5V and vice-versa. Also assume that there is no skew between IN 
and IN (i.e., the inverter delay to derive IN from IN is zero). 




a. What voltage is the bulk terminal of M2 connected to? 
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b. What is the voltage swing on the output node as the inputs swing from OV to 2.5V. Show 
the low value and the high value. 

c. Assume that the inputs IN and IN have zero rise and fall times. Assume a zero skew 
between IN and IN. Determine the low to high propagation delay for charging the output 
node measured from the the 50% point of the input to the 50% point of the output. Assume 
that the total load capacitance is lpF, including the transistor parasitics. 

d. Assume that, instead of the lpF load, the low swing driver drives a non-linear capacitor, 
whose capacitance vs. voltage is plotted below. Compute the energy drawn from the low 
supply for charging up the load capacitor. Ignore the parasitic capacitance of the driver cir- 
cuit itself. 




14. The inverter below operates with V DD =0.4V and is composed of IVtl = 0.5V devices. The 
devices have identical I () and n. 

a. Calculate the switching threshold (V M ) of this inverter. 

b. Calculate Vjl and V 1H of the inverter. 



V DD - 0.4V 




Figure 5.11 Inverter in Weak Inversion Regime 



15. Sizing a chain of inverters. 

a. In order to drive a large capacitance (C L = 20 pFj from a minimum size gate (with input 
capacitance C; = lOfF), you decide to introduce a two-staged buffer as shown in Figure 
5.12. Assume that the propagation delay of a minimum size inverter is 70 ps. Also assume 




Section 5. 1 Exercises and Design Problems 



189 



that the input capacitance of a gate is proportional to its size. Determine the sizing of the 
two additional buffer stages that will minimize the propagation delay. 




Figure 5.12 Buffer insertion for driving large loads. 

b. If you could add any number of stages to achieve the minimum delay, how many stages 
would you insert? What is the propagation delay in this case? 

c. Describe the advantages and disadvantages of the methods shown in (a) and (b). 

d. Determine a closed form expression for the power consumption in the circuit. Consider 
only gate capacitances in your analysis. What is the power consumption for a supply volt- 
age of 2.5V and an activity factor of 1? 

16. [M, None, 3.3.5] Consider scaling a CMOS technology by S > 1. In order to maintain compat- 

ibility with existing system components, you decide to use constant voltage scaling. 

a. In traditional constant voltage scaling, transistor widths scale inversely with S, W^ 1/S. 
To avoid the power increases associated with constant voltage scaling, however, you 
decide to change the scaling factor for W. What should this new scaling factor be to main- 
tain approximately constant power. Assume long-channel devices (i.e., neglect velocity 
saturation). 

b. How does delay scale under this new methodology? 

c. Assuming short-channel devices (i.e., velocity saturation), how would transistor widths 
have to scale to maintain the constant power requirement? 
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DESIGN PROBLEM 

Using the 0.25 |im CMOS introduced in Chapter 2, design a static CMOS 
inverter that meets the following requirements: 

1. Matched pull-up and pull-down times (i.e., t pHL = t pLH ). 

2. t p = 5 nsec (± 0.1 nsec). 

The load capacitance connected to the output is equal to 4 pF. Notice that this 
capacitance is substantially larger than the internal capacitances of the gate. 

Determine the W and L of the transistors. To reduce the parasitics, use 
minimal lengths (L = 0.25 flm) for all transistors. Verify and optimize the design 
using SPICE after proposing a first design using manual computations. Com- 
pute also the energy consumed per transition. If you have a layout editor (such 
as MAGIC) available, perform the physical design, extract the real circuit 
parameters, and compare the simulated results with the ones obtained earlier. 
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6.1 Introduction 



The design considerations for a simple inverter circuit were presented in the previous 
chapter. Now, we will extend this discussion to address the synthesis of arbitrary digital 
gates such as NOR, NAND and XOR. The focus is on combinational logic (or non-regen- 
erative ) circuits; this is, circuits that have the property that at any point in time, the output 
of the circuit is related to its current input signals by some Boolean expression (assuming 
that the transients through the logic gates have settled). No intentional connection between 
outputs and inputs is present. 

This is in contrast to another class of circuits, known as sequential or regenerative, 
for which the output is not only a function of the current input data, but also of previous 
values of the input signals (Figure 6.1). This is accomplished by connecting one or more 
outputs intentionally back to some inputs. Consequently, the circuit “remembers” past 
events and has a sense of history. A sequential circuit includes a combinational logic por- 
tion and a module that holds the state. Example circuits are registers, counters, oscillators, 
and memory. Sequential circuits are the topic of the next Chapter. 



In 



* 

> 

* 



Combinational 

Logic 

Circuit 



*- 

Out 

>• 



(a) Combinational 



In 




Out 



(b) Sequential 



Figure 6.1 High level classification of logic circuits. 



There are numerous circuit styles to implement a given logic function. As with the 
inverter, the common design metrics by which a gate is evaluated are area, speed, energy 
and power. Depending on the application, the emphasis will be on different metrics. For 
instance, the switching speed of digital circuits is the primary metric in a high-perfor- 
mance processor, while it is energy dissipation in a battery operated circuit. In addition to 
these metrics, robustness to noise and reliability are also very important considerations. 
We will see that certain logic styles can significantly improve performance, but are more 
sensitive to noise. Recently, power dissipation has also become a very important require- 
ment and significant emphasis is placed on understanding the sources of power and 
approaches to deal with power. 



6.2 Static CMOS Design 

The most widely used logic style is static complementary CMOS. The static CMOS style 
is really an extension of the static CMOS inverter to multiple inputs. In review, the pri- 
mary advantage of the CMOS structure is robustness (i.e, low sensitivity to noise), good 
performance, and low power consumption with no static power dissipation. Most of those 
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properties are carried over to large fan-in logic gates implemented using a similar circuit 
topology. 

The complementary CMOS circuit style falls under a broad class of logic circuits 
called static circuits in which at every point in time (except during the switching tran- 
sients), each gate output is connected to either V DD or V ss via a low-resistance path. Also, 
the outputs of the gates assume at all times the value of the Boolean function implemented 
by the circuit (ignoring, once again, the transient effects during switching periods). This is 
in contrast to the dynamic circuit class, which relies on temporary storage of signal values 
on the capacitance of high-impedance circuit nodes. The latter approach has the advantage 
that the resulting gate is simpler and faster. Its design and operation are however more 
involved and prone to failure due to an increased sensitivity to noise. 

In this section, we sequentially address the design of various static circuit flavors 
including complementary CMOS, ratioed logic (pseudo-NMOS and DCVSL), and pass- 
transistor logic. The issues of scaling to lower power supply voltages and threshold volt- 
ages will also be dealt with. 

6.2.1 Complementary CMOS 

Concept 

A static CMOS gate is a combination of two networks, called the pull-up network (PUN) 
and the pull-down network (PDN) (Figure 6.2). The figure shows a generic N input logic 
gate where all inputs are distributed to both the pull-up and pull-down networks. The func- 
tion of the PUN is to provide a connection between the output and V DD anytime the output 
of the logic gate is meant to be 1 (based on the inputs). Similarly, the function of the PDN 
is to connect the output to V ss when the output of the logic gate is meant to be 0. The PUN 
and PDN networks are constructed in a mutually exclusive fashion such that one and only 
one of the networks is conducting in steady state. In this way, once the transients have set- 
tled, a path always exists between V DD and the output F, realizing a high output (“one”), 
or, alternatively, between V ss and F for a low output (“zero”). This is equivalent to stating 
that the output node is always a low-impedance node in steady state. 



^DD 




Figure 6.2 Complementary logic gate as a combination of a PUN (pull-up network) and a 
PDN (pull-down network). 
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In constructing the PDN and PUN networks, the following observations should be 
kept in mind: 

• A transistor can be thought of as a switch controlled by its gate signal. An NMOS 
switch is on when the controlling signal is high and is off when the controlling signal 
is low. A PMOS transistor acts as an inverse switch that is on when the controlling 
signal is low and off when the controlling signal is high. 



• The PDN is constructed using NMOS devices, while PMOS transistors are used in 
the PUN. The primary reason for this choice is that NMOS transistors produce 
“strong zeros,” and PMOS devices generate “strong ones”. To illustrate this, con- 
sider the examples shown in Figure 6.3. In Figure 6.3a, the output capacitance is ini- 
tially charged to V DD . Two possible discharge scenarios are shown. An NMOS 
device pulls the output all the way down to GND, while a PMOS lowers the output 
no further than I V Tp \ — the PMOS turns off at that point, and stops contributing dis- 
charge current. NMOS transistors are hence the preferred devices in the PDN. Simi- 
larly, two alternative approaches to charging up a capacitor are shown in Figure 
6.3b, with the output initially at GND. A PMOS switch succeeds in charging the 
output all the way to V DD , while the NMOS device fails to raise the output above 
V Dif V Tn . This explains why PMOS transistors are preferentially used in a PUN. 



Out 



H 



Vdd~ > 0 



Out V DD~^ 1 ^Tp 1 









% 

(a) pulling down a node using NMOS and PMOS switches 



H 



V nn - V T , 



Out 



Ac, 



0 ^ Vn 



X' 



T Cl 

(b) pulling down a node using NMOS and PMOS switches 



Figure 6.3 Simple examples 
illustrate why an NMOS should be 
used as a pull-down, and a PMOS 
should be used as a pull-up device. 



• A set of construction rules can be derived to construct logic functions (Figure 6.4). 
NMOS devices connected in series corresponds to an AND function. With all the 
inputs high, the series combination conducts and the value at one end of the chain is 
transferred to the other end. Similarly, NMOS transistors connected in parallel rep- 
resent an OR function. A conducting path exists between the output and input termi- 
nal if at least one of the inputs is high. Using similar arguments, construction rules 
for PMOS networks can be formulated. A series connection of PMOS conducts if 
both inputs are low, representing a NOR function ( A.B = A+B), while PMOS transis- 
tors in parallel implement a NAND (A+B = AB. 

• Using De Morgan’s theorems ((A + B) = A B and A B =A+ B), it can be shown that 
the pull-up and pull-down networks of a complementary CMOS structure are dual 
networks. This means that a parallel connection of transistors in the pull-up network 
corresponds to a series connection of the corresponding devices in the pull-down 
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Series Combination 
Conducts if A B 




(a) series 



. L 

A ~\ 

' 1 J 

(b) parallel 




Parallel Combination 
Conducts if A + B 



Figure 6.4 NMOS logic rules — series devices implement an AND, and parallel devices 
implement an OR. 

network, and vice versa. Therefore, to construct a CMOS gate, one of the networks 
(e.g., PDN) is implemented using combinations of series and parallel devices. The 
other network (i.e., PUN) is obtained using duality principle by walking the hierar- 
chy, replacing series sub-nets with parallel sub-nets, and parallel sub-nets with 
series sub-nets. The complete CMOS gate is constructed by combining the PDN 
with the PUN. 

• The complementary gate is naturally inverting, implementing only functions such as 
NAND, NOR, and XNOR. The realization of a non-inverting Boolean function 
(such as AND OR, or XOR) in a single stage is not possible, and requires the addi- 
tion of an extra inverter stage. 

• The number of transistors required to implement an A-input logic gate is 2 A. 



Example 6.1 Two-input NAND Gate 

Figure 6.5 shows a two-input NAND gate (F = AB). The PDN network consists of two 
NMOS devices in series that conduct when both A and B are high. The PUN is the dual net- 
work, and consists of two parallel PMOS transistors. This means that F is 1 if A = 0 or B = 0, 
which is equivalent to F = A B. The truth table for the simple two input NAND gate is given 
in Table 6.1. It can be verified that the output F is always connected to either V DD or GND, 
but never to both at the same time. 



A 




Table 6.1Truth Table for 2 input NAND 



A 


B 


F 


0 


0 


1 


0 


1 


1 


1 


0 


1 


1 


1 


0 



Figure 6.5 Two-input NAND gate in complementary static CMOS style. 



Example 6.2 Synthesis of complex CMOS Gate 

Using complementary CMOS logic, consider the synthesis of a complex CMOS gate whose 
function is F = D + A- (B +C ). The first step in the synthesis of the logic gate is to derive the 
pull-down network as shown in Figure 6.6a by using the fact that NMOS devices in series 
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implements the AND function and parallel device implements the OR function. The next step 
is to use duality to derive the PUN in a hierarchical fashion. The PDN network is broken into 
smaller networks (i.e., subset of the PDN) called sub-nets that simplify the derivation of the 
PUN. In Ligure 6.6b, the sub-nets (SN) for the pull-down network are identified At the top 
level, SN1 and SN2 are in parallel so in the dual network, they will be in series. Since SN1 
consists of a single transistor, it maps directly to the pull-up network. On the other hand, we 
need to recursively apply the duality rules to SN2. Inside SN2, we have SN3 and SN4 in 
series so in the PUN they will appear in parallel. Linally, inside SN3, the devices are in paral- 
lel so they appear in series in the PUN. The complete gate is shown in Ligure 6.6c. The reader 
can verify that for every possible input combination, there always exists a path to either V DD 
or GND. 

Vdd V D d 



jH ^ 

(a) pull-down network 




(b) Deriving the pull-up network 
hierarchically by identifying 
sub-nets 



Figure 6.6 Complex complementary CMOS gate. 




Static Properties of Complementary CMOS Gates 

Complementary CMOS gates inherit all the nice properties of the basic CMOS inverter. 
They exhibit rail to rail swing with V 0H = V DD and V OL = GND. The circuits also have no 
static power dissipation, since the circuits are designed such that the pull-down and pull- 
up networks are mutually exclusive. The analysis of the DC voltage transfer characteris- 
tics and the noise margins is more complicated then for the inverter, as these parameters 
depend upon the data input patterns applied to gate. 

Consider the static two-input NAND gate shown in Figure 6.7. Three possible input 
combinations switch the output of the gate from high-to-low: (a) A = B = 0 — > 1, (b) A= 1, 
B = 0 — » 1, and (c) B= 1, A = 0 —> 1. The resulting voltage transfer curves display signifi- 
cant differences. The large variation between case (a) and the others (b & c) is explained 
by the fact that in the former case both transistors in the pull-up network are on simulta- 
neously for A=B= 0, representing a strong pull-up. In the latter cases, only one of the pull- 
up devices is on. The VTC is shifted to the left as a result of the weaker PUN. 

The difference between (b) and (c) results mainly from the state of the internal node 
int between the two NMOS devices. For the NMOS devices to turn on, both gate-to- 
source voltages must be above V Tn , with V GS2 = V A - V DS1 and V GS1 = V B . The threshold 
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Figure 6.7 The VTC of a two-input NAND is data-dependent. NMOS devices are 
0.5pm/0.25pm while the PMOS devices are sized at 0.75(im/0.25(j.m. 



voltage of transistor M 2 will be higher than transistor M l due to the body effect. The 
threshold voltages of the two devices are given by: 



^Tn2 ~ VtnO + yaJ\2^\ + V in ,)-J^\) ( 6 . 1 ) 

V T m = V ln0 (6.2) 

For case (b), M 3 is turned off , , and the gate voltage of M 2 is set to V DD . To a first 
order, M 2 may be considered as a resistor in series with M v Since the drive on M 2 is large, 
this resistance is small and has only a small effect on the voltage transfer characteristics. 
In case (c), transistor Ml acts as a resistor, causing body effect in M 2 . The overall impact 
is quite small as seen from the plot. 



Design Consideration 



The important point to take away from the above discussion is that the noise margins are 
input-pattern dependent. For the above example, a glitch on only one of the two inputs has a 
larger chance of creating a false transition at the output than when the glitch would occur on 
both inputs simultaneously. Therefore, the former condition has a lower low noise margin. A 
common practice when characterizing gates such as NAND and NOR is to connect all the 
inputs together. This unfortunately does not represent the worst-case static behavior. The data 
dependencies should be carefully modeled. 



Propagation Delay of Complementary CMOS Gates 

The computation of propagation delay proceeds in a fashion similar to the static inverter. 
For the purpose of delay analysis, each transistor is modeled as a resistor in series with an 
ideal switch. The value of the resistance is dependent on the power supply voltage and an 
equivalent large signal resistance, scaled by the ratio of device width over length, must be 
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used. The logic is transformed into an equivalent RC network that includes the effect of 
internal node capacitances. Figure 6.8 shows the two-input NAND gate and its equivalent 
RC switch level model. Note that the internal node capacitance C int — attributable to the 
source/drain regions and the gate overlap capacitance of M 2 IM X — is included. While com- 
plicating the analysis, the capacitance of the internal nodes can have quite an impact in 
some networks such as large fan-in gates. In a first pass, we ignore the effect of the inter- 
nal capacitance. 




Figure 6.8 Equivalent RC 
model for a 2-input NAND gate. 



(a) Two-input NAND (b) RC equivalent model 

A simple analysis of the model shows that — similar to the noise margins — the 
propagation delay depends upon the input patterns. Consider for instance the low-to- 
high transition. Three possible input scenarios can be identified for charging the output to 
V DD . If both inputs are driven low, the two PMOS devices are on. The delay in this case is 
0.69 x ( Rp/2 ) x C L , since the two resistors are in parallel. This is not the worst-case low-to- 
high transition, which occurs when only one device turns on, and is given by 0.69 x R p x 
C L . For the pull-down path, the output is discharged only if both A and B are switched 
high, and the delay is given by 0.69 x (2 R N ) x C L to a first order. In other words, adding 
devices in series slows down the circuit, and devices must be made wider to avoid a per- 
formance penalty. When sizing the transistors in a gate with multiple fan-in’s, we should 
pick the combination of inputs that triggers the worst-case conditions. 

For example, for a NAND gate to have the same pull-down delay (t pM ) as a mini- 
mum-sized inverter, the NMOS devices in the NAND stack must be made twice as wide 
so that the equivalent resistance the NAND pull-down is the same as the inverter. The 
PMOS devices can remain unchanged. 

This first-order analysis assumes that the extra capacitance introduced by widening 
the transistors can be ignored. This is not a good assumption in general, but allows for a 
reasonable first cut at device sizing. 



Example 6.3 Delay dependence on input patterns 

Consider the NAND gate of Figure 6.8a. Assume NMOS and PMOS devices of 
0.5pm/0.25pm and 0.75pni/0.25pm, respectively. This sizing should result in approximately 
equal worst-case rise and fall times (since the effective resistance of the pull-down is 
designed to be equal to the pull-up resistance). 
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Figure 6.9 shows the simulated low-to-high delay for different input patterns. As 
expected, the case where both inputs transition go low (A = B = 1— >0) results in a smaller 
delay, compared to the case where only one input is driven low. Notice that the worst-case 
low-to-high delay depends upon which input (A or B) goes low. The reason for this involves 
the internal node capacitance of the pull-down stack (i.e., the source of M 2 ). For the case that 
B = 1 and A transitions from 1— >0, the pull-up PMOS device only has to charge up the output 
node capacitance since M 2 is turned off. On the other hand, for the case where A=1 and B tran- 
sitions from 1— >0, the pull-up PMOS device has to charge up the sum of the output and the 
internal node capacitances, which slows down the transition. 



Input Data 


Delay 


Pattern 


(psec) 


A = B= 0^1 


69 


A = 1, B= 0^1 


62 


A= 0^1, B= 1 


50 


A=B=1— >0 


35 


4=1, B = 1— >0 


76 


4= 1^0, B= 1 


57 




Figure 6.9 Example showing the delay dependence on input patterns. 

The table in Figure 6.9 shows a compilation of various delays for this circuit. The first- 
order transistor sizing indeed provides approximately equal rise and fall delays. An important 
point to note is that the high-to-low propagation delay depends on the state of the internal 
nodes. For example, when both inputs transition from 0— >1, it is important to establish the 
state of the internal node. The worst-case happens when the internal node is charged up to 
V DD -V Tn . The worst case can be ensured by pulsing the A input from 1 — >0 — > 1 , while input B 
only makes the 0— >1 . In this way, the internal node is initialized properly. 

The important point to take away from this example is that estimation of delay can be 
fairly complex, and requires a careful consideration of internal node capacitances and data 
patterns. Care must be taken to model the worst-case scenario in the simulations. A brute 
force approach that applies all possible input patterns, may not always work as it is important 
to consider the state of internal nodes. 



The CMOS implementation of a NOR gate (F = A + B) is shown in Figure 6. 10. The 
output of this network is high, if and only if both inputs A and B are low. The worst-case 
pull-down transition happens when only one of the NMOS devices turns on (i.e., if either 
A or B is high). Assume that the goal is to size the NOR gate such that it has approxi- 
mately the same delay as an inverter with the following device sizes: NMOS 
0.5(im/0.25)J.m and PMOS 1.5(im/0.25(J.m. Since the pull-down path in the worst case is a 
single device, the NMOS devices (M, and M 2 ) can have the same device widths as the 
NMOS device in the inverter. For the output to be pulled high, both devices must be 
turned on. Since the resistances add, the devices must be made two times larger compared 
to the PMOS in the inverter (i.e., M 3 and M 4 must have a size of 3(im/0.25)im). Since 
PMOS devices have a lower mobility relative to NMOS devices, stacking devices in series 
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must be avoided as much as possible. A NAND implementation is clearly preferred over a 
NOR implementation for implementing generic logic. 




Figure 6.10 Sizing of a NOR gate to 
produce the same delay as an inverter with 
size of NMOS: 0.5gm/0.25(im and PMOS: 
1.5(im/0.25(im. 



Problem 6.1 Transistor Sizing in Complementary CMOS Gates 

Determine the transistor sizes of the individual transistors in Figure 6.6c such that it has 
approximately the same t plll and t M as a inverter with the following sizes: NMOS: 
0.5pm/0.25pm and PMOS: 1.5pm/0.25pm. 

So far in the analysis of propagation delay, we have ignored the effect of internal 
node capacitances. This is often a reasonable assumption for a first-order analysis. How- 
ever, in more complex logic gates that have large fan-in, the internal node capacitances 
can become significant. Consider a 4-input NAND gate as shown in Figure 6.11, which 
shows the equivalent RC model of the gate, including the internal node capacitances. The 
internal capacitances consist of the junction capacitance of the transistors, as well as the 
gate-to-source and gate-to-drain capacitances. The latter are turned into capacitances to 
ground using the Miller equivalence. The delay analysis for such a circuit involves solving 
distributed RC networks, a problem we already encountered when analyzing the delay of 
interconnect networks. Consider the pull-down delay of the circuit. The output is dis- 
charged when all inputs are driven high. The proper initial conditions must be placed on 
the internal nodes (this is, the internal nodes must be charged to V DD -V m ) before the 
inputs are driven high. 
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The propagation delay can be computed using the Elmore delay model and is 
approximated as: 

t pHL = 0.69 (Sj ■ Cj + CRj +R 9 ) ■ c 7 + (R l +R 7 + R 3 ) ■ C^ + (R^+R 7 +R^ + R^) ■ C L ) (6.3) 

Notice that the resistance of M x appears in all the terms, which makes this device 
especially important when attempting to minimize delay. Assuming that all NMOS 
devices have an equal size, Eq. (6.3) simplifies to 

t hl = 0.69 W + 2 ■ C 2 + 3 ■ C 3 + 4 ■ C L ) (6.4) 



Example 6.4 A Four-Input Complementary CMOS NAND Gate 

In this example, the intrinsic propagation delay of the 4 input NAND gate (without any load- 
ing) is evaluated using hand analysis and simulation. Assume that all NMOS devices have a 
W/L of 0.5pm/0.25pm, and all PMOS devices have a device size of 0.375pm/0.25pm. The 
layout of a four-input NAND gate is shown in Figure 6.12. The devices are sized such that the 
worst case rise and fall time are approximately equal (to first order ignoring the internal node 
capacitances). 

Using techniques similar to those employed for the CMOS inverter in Chapter 3, the 
capacitances values can be computed from the layout. Notice that in the pull-up path, the 
PMOS devices share the drain terminal in order to reduce the overall parasitic contribution to 
the output. Using our standard design rules, the area and perimeter for various devices can be 
easily computed as shown in Table 6.1 

In this example, we will focus on the pull-down delay, and the capacitances will be 
computed for the high-to-low transition at the output. While the output makes a transition 
from V DD to 0, the internal nodes only transition from V DD -V Tn to GND. We would need to 
linearize the internal junction capacitances for this voltage transition, but, to simplify the 
analysis, we will use the same A^for the internal nodes as for the output node. 




Figure 6.12 Layout a four-input NAND gate in complementary CMOS. 
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Table 6.1 Area and perimeter of transistors in 4 input NAND gate. 



Transistor 


W (pm) 


AS (pm 2 ) 


AD (pm 2 ) 


PS (pm) 


PD(pm) 


1 


0.5 


0.3125 


0.0625 


1.75 


0.25 


2 


0.5 


0.0625 


0.0625 


0.25 


0.25 


3 


0.5 


0.0625 


0.0625 


0.25 


0.25 


4 


0.5 


0.0625 


0.3125 


0.25 


1.75 


5 


0.375 


0.296875 


0.171875 


1.875 


0.875 


6 


0.375 


0.171875 


0.171875 


0.875 


0.875 


7 


0.375 


0.171875 


0.171875 


0.875 


0.875 


8 


0.375 


0.296875 


0.171875 


1.875 


0.875 



It is assumed that the output connects to a single, minimum-size inverter. The effect of 
intra-cell routing, which is small, is ignored. The various contributions are summarized in 
Table 6.2. Lor the NMOS and PMOS junctions, we use K eq = 0.57. K eqsw = 0.61, and K eq 
= 0.79, K eqsw = 0.86, respectively. Notice that the gate-to-drain capacitance is multiplied 
by a factor of two for all internal nodes and the output node to account for the Miller 
effect (this ignores the fact that the internal nodes have a slightly smaller swing due to 
the threshold drop). 



Table 6.2 Computation of capacitances for high-to-low transition at the output. The table shows 
the intrinsic delay of the gate without extra loading. Any fan-out capacitance would simply be 
added to the C L term. 



Capacitor 


Contributions (H— >L) 


Value (fF) (H->L) 


c. 


Cdi + C s2 + 2 * C gdI + 2* C gs2 


(0.57 * 0.0625 * 2+ 0.61 * 0.25 * 0.28) + 
(0.57 * 0.0625 * 2+ 0.61 * 0.25* 0.28) + 
2 * (0.31 * 0.5) + 2 * (0.31 * 0.5) = 0.85fF 


c. 


Cd 2 + C s3 + 2 * C gd2 + 2 * C gs3 


(0.57 * 0.0625 * 2+ 0.61 * 0.25 * 0.28) + 
(0.57 * 0.0625 * 2+ 0.61 * 0.25* 0.28) + 
2 * (0.31 * 0.5) + 2 * (0.31 * 0.5) = 0.85fF 


C, 


C d3 + C s4 + 2*C gJ3 + 2*C gs4 


(0.57 * 0.0625 * 2+ 0.61 * 0.25 * 0.28) + 
(0.57 * 0.0625 * 2+ 0.61 * 0.25* 0.28) + 
2 * (0.31 * 0.5) + 2 * (0.31 * 0.5) = 0.85fF 


C L 


c d4 + 2 * c gd4 + C d5 +C d6 +C d7 + C d8 + 
2 * C gd $+2 * C gd g¥ 2 * C gd7 + 2 * C gds 
- C d4 + 4 * C d5 + 4*2* C gd6 


(0.57 * 0.3125 * 2 + 0.61 * 1.75 *0.28) + 

2 * (0.31 * 0.5)+ 4 * (0.79 * 0.171875* 1.9+ 0.86 
* 0.875 * 0.22)+ 4 * 2 * (0.27 * 0.375) = 3.47fF 



Using Eq. (6.4), we can compute the propagation delay as: 

tp HL = j(0.85/F + 2 ■ 0.85 fF + 3 ■ 0.85 fF + 4 ■ 3.47 fF)= 85 ps 

The simulated delay for this particular transition was found to be 86 psec! The hand analysis 
gives a fairly accurate estimate given all assumptions and linearizations made. Lor example, 
we assume that the gate-source (or gate-drain) capacitance only consists of the overlap com- 
ponent. This is not entirely the case, as during the transition some other contributions come in 
place depending upon the operating region. Once again, the goal of hand analysis is not to 
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provide a totally accurate delay prediction, but rather to give intuition into what factors influ- 
ence the delay and to aide in initial transistor sizing. Accurate timing analysis and transistor 
optimization is usually done using SPICE. The simulated worst-case low-to-high delay time 
for this gate was 106ps. 



While complementary CMOS is a very robust and simple approach for implement- 
ing logic gates, there are two major problems associated with using this style as the com- 
plexity of the gate (i.e., fan-in ) increases. First, the number of transistors required to 
implement an N fan-in gate is 2N. This can result in significant implementation area. The 
second problem is that propagation delay of a complementary CMOS gate deteriorates 
rapidly as a function of the fan-in. The large number of transistors (2N) increases the over- 
all capacitance of the gate. For an /V-input NAND gate, the output capacitance increases 
linearly with the fan-in since the number of PMOS devices connected to the output node 
increases linearly with the fan-in. Also, a series connection of transistors in either the PUN 
or PDN slows the gate as well, because the effective (dis)charging resistance is increased. 
For the same /V-input NAND gate, the effective resistance of the PDN path increases lin- 
early with the fan-in. Since the output capacitance increase linearly and the pull-down 
resistance increases linearly, the high-to-low delay can increase in a quadratic fashion. 

The fan-out has a large impact on the delay of complementary CMOS logic as well. 
Each input to a CMOS gate connects to both an NMOS and a PMOS device, and presents 
a load to the driving gate equal to the sum of the gate capacitances. 

The above observations are summarized by the following formula, which approxi- 
mates the influence of fan-in and fan-out on the propagation delay of the complementary 
CMOS gate 



t p = a { FI + a 2 FI 2 + a^FO (6.5) 

where FI and FO are the fan-in and fan-out of the gate, respectively, and a h a 7 and a 3 are 
weighting factors that are a function of the technology. 

At first glance, it would appear that the increase in resistance for larger fan-in can be 
solved by making the devices in the transistor chain wider. Unfortunately, this does not 
improve the performance as much as expected, since widening a device also increases its 
gate and diffusion capacitances, and has an adverse affect on the gate performance. For 
the /V-input NAND gate, the low-to-high delay only increases linearly since the pull-up 
resistance remains unchanged and only the capacitance increases linearly. 

Figure 6.13 show the propagation delay for both transitions as a function of fan-in 
assuming a fixed fan-out (NMOS: 0.5(l.m and PMOS: 1.5|im). As predicted above, the 
tpiH increases linearly due to the linearly-increasing value of the output capacitance. The 
simultaneous increase in the pull-down resistance and the load capacitance results in an 
approximately quadratic relationship for t pHL . Gates with a fan-in greater than or equal to 4 
become excessively slow and must be avoided. 



Design Techniques for Large Fan-in 



Several approaches may be used to reduce delays in large fan-in circuits. 
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Figure 6.13 Propagation delay of 
CMOS NAND gate as a function of 
fan-in. A fan-out of one inverter is 
assumed, and all pull-down 
transistors are minimal size. 



1. Transistor Sizing 

The most obvious solution is to increase the overall transistor size. This lowers the resis- 
tance of devices in series and lowers the time constant. However, increasing the transistor size, 
results in larger parasitic capacitors, which do not only affect the propagation delay of the gate 
in question, but also present a larger load to the preceding gate. This technique should, there- 
fore, be used with caution. If the load capacitance is dominated by the intrinsic capacitance of 
the gate, widening the device only creates a “self-loading” effect, and the propagation delay is 
unaffected. A more comprehensive approach towards sizing transistors in complex CMOS 
gates is discussed in the next section. 

2. Progressive Transistor Sizing 



An alternate approach to uniform sizing (in which each transistor is scaled up uni- 
formly), is to use progressive transistor sizing (Figure 6.14). Referring back to Eq. (6.3), we see 
that the resistance of M ] (R,) appears N times in the delay equation, the resistance of M 2 (R 2 ) 
appears N - 1 times, etc. From the equation, it is clear that R } should be made the smallest, R , the 
next smallest, etc. Consequently, a progressive scaling of the transistors is beneficial: M x > M 2 
> M 3 > M n . Basically, in this approach, the important resistance is reduced while reducing 
capacitance. For an excellent treatment on the optimal sizing of transistors in a complex net- 
work, we refer the interested reader to [Shoji88, pp. 131-143]. The reader should be aware of 
Out 



In » 



H 



MN 



IPr, 



M-f > M 2 > M 3 > M n 



ln 2 



In i 



Figure 6.14 Progressive sizing of transistors in large transistor 
chains copes with the extra load of internal capacitances. 
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Figure 6.15 Influence of transistor ordering on delay. Signal /n, is the critical signal. 

one important pitfall of this approach. While progressive resizing of transistors is relatively 
easy in a schematic diagram, it is not as simple in a real layout. Very often, design-rule consid- 
erations force the designer to push the transistors apart, which causes the internal capacitance 
to grow. This may offset all the gains of the resizing! 

3. Input Re-Ordering 

Some signals in complex combinational logic blocks might be more critical than others. 
Not all inputs of a gate arrive at the same time (due, for instance, to the propagation delays of 
the preceding logical gates). An input signal to a gate is called critical if it is the last signal of 
all inputs to assume a stable value. The path through the logic which determines the ultimate 
speed of the structure is called the critical path. 

Putting the critical-path transistors closer to the output of the gate can result in a speed- 
up. This is demonstrated in Figure 6.15. Signal Iiij is assumed to be a critical signal. Suppose 
further that In 2 and /« 3 are high and that In l undergoes a 0— >1 transition. Assume also that C L 
is initially charged high. In case (a), no path to GND exists until M, is turned on, which is 
unfortunately the last event to happen. The delay between the arrival of In l and the output 
is therefore determined by the time it takes to discharge C L , C l and C> In the second case, 
C 1 and C 2 are already discharged when In 1 changes. Only C L still has to be discharged, 
resulting in a smaller delay. 

4. Logic Restructuring 

Manipulating the logic equations can reduce the fan-in requirements and hence reduce 
the gate delay, as illustrated in Figure 6.16. The quadratic dependency of the gate delay on fan- 
in makes the six-input NOR gate extremely slow. Partitioning the NOR-gate into two three- 
input gates results in a significant speed-up, which offsets by far the extra delay incurred by 
turning the inverter into a two-input NAND gate. 



CH> C> 



Figure 6.16 Logic restructuring 
can reduce the gate fan-in. 
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Transistor Sizing for Performance in Combinational Networks 

Earlier, we established that minimization of the propagation delay of a gate in isolation is 
a purely academic effort. The sizing of devices should happen in its proper context. In 
Chapter 5, we developed a methodology to do so for inverters. In Chapter 5 we found out 
that an optimal fanout for a chain of inverters driving a load C L is (C L /C in ) UN , where N is 
the number of stages in the chain, and C in the input capacitance of the first gate in the 
chain. If we have an opportunity to select the number of stages, we found out that we 
would like to keep the fanout per stage around 4. Can this result be extended to determine 
the size of any combinational path for minimal delay? By extending our previous 
approach to address complex logic networks, we will find out that this is indeed possible 
[Sutherland99].' 

To do so, we modify the basic delay equation of the inverter, introduced in Chapter 
5, and repeated here for the sake of clarity, 

t p = tjl + = f„ 0 (l+//Y) (6-6) 

to 

tp = t pQ (p + gf/ 7) (6.7) 

with t p0 still representing the intrinsic delay of an inverter, and / the ratio between the 
external load and the input capacitance of the gate. In this context, /is often called the 
electrical effort, p represents the ratio of the intrinsic (or unloaded) delays of the complex 
gate and the simple inverter. The more involved structure of the multiple-input gate, com- 
bined with its series devices, increases its intrinsic delay, p is a function of gate topology 
as well as layout style. Table 6.3 enumerates the values of p for some standard gates, 
assuming simple layout styles, and ignoring second-order effects such as internal node 
capacitances. 

Table 6.3 Estimates of intrinsic delay factors of various logic types assuming simple layout styles, and 

a fixed PMOS/NMOS ratio. 



Gate type 


P 


Inverter 


1 


w-input NAND 


n 


rc-input NOR 


n 


n-way multiplexer 


2 n 


XOR, NXOR 


nT~ x 



1 The approach introduced in this section is commonly called logical effort, and was first introduced in 
[Sutherland99], which presents an extensive treatment of the topic. The treatment offered here represents only a 
glance-over of the overall approach. 
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The factor g is called the logical effort, and represents the fact that, for a given load, 
complex gates have to work harder than an inverter to produce a similar response. In other 
words, the logical effort of a logic gate tells how much worse it is at producing output cur- 
rent than an inverter, given that each of its inputs may contain only the same input capaci- 
tance as the inverter. Equivalently, logical effort is how much more input capacitance a 
gate presents to deliver the same output current as an inverter. Logical effort is a useful 
parameter, because it depends only on circuit topology. The logical efforts of some com- 
mon logic gates are given in Table 6.4. 



Table 6.4 Logic efforts of common logic gates, assuming a PMOS/NMOS ratio of 2. 





Number of Inputs 


Gate Type 


1 


2 


3 


11 


Inverter 


1 








NAND 




4/3 


5/3 


(n+2)/3 


NOR 




5/3 


7/3 


(2n+l)/3 


Multiplexer 




2 


2 


2 


XOR 




4 


12 





Example 6.5 Logical effort of complex gates 

Consider the gates shown in Figure 6.17. Assuming an PMOS/NMOS ratio of 2, the input 
capacitance of a minimum-sized symmetrical inverter equals 3 times the gate capacitance of a 
minimum-sized NMOS (called C unit ). We size the 2-input NAND and NOR such that their 
equivalent resistances equal the resistance of the inverter (using the techniques described ear- 
lier). This increases the input capacitance of the 2-input NOR to 4 C mjt , or 4/3 the capacitance 
of the inverter.The input capacitance of the 2-input NOR is 5/3 that of the inverter. Equiva- 
lently, for the same input capacitance, the NAND and NOR gate have 4/3 and 5/3 less driving 
strength than the inverter. This affects the delay component that corresponds to the load, 
increasing it by this same factor, called ‘logical effort.’ Hence, g N ^ND = 4/3, and g N0R = 5/3. 





Inverter 



2-input NAND 



2-input NOR 



Figure 6.17 Logical effort of 2-input NAND 
and NOR gates. 
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The delay model of a logic gate, as 
represented in Eq. (6.7), is a simple 
linear relationship. Figure 6.18 shows 
this relationship graphically: the delay 
is plotted as a function of the fanout 
(electrical effort) for an inverter and 
for a 2-input NAND gate. The slope of 
the line is the logical effort of the gate; 
its intercept is the intrinsic delay. The 
graph shows that we can adjust the 
delay by adjusting the effective fanout 
(by transistor sizing) or by choosing a 
logic gate with a different logical 
effort. Observe also that fanout and 
logical effort contribute to the delay in 
a similar way. We call the product of 
the two h =fg the gate effort. 

The total delay of a path through a combinational logic block can now be expressed 




Figure 6.18 Delay as a function of fanout for an 
inverter and a 2-input NAND. 




I=f 1=1 



We use a similar procedure as we did for the inverter chain in Chapter 5 to determine the 
minimum delay of the path. By finding N - 1 partial derivatives and setting theme to zero, 
we find that each stage should bear the same ‘effort’: 



fiSi = fiSi = •■• = f N g N (6.9) 

The fanouts along the path can be multiplied to get a path effective fanout, and so can the 
logical efforts. 



(6 , 0) 

^ - 8i8i---8n 

The path effort can then be defined as the product of the two, or H = FG. From here on, the 
analysis proceeds along the same lines as the inverter chain. The gate effort that minimizes 
the path delay is found to equal 



h = n Jfg = n Jh , 

and the minimum delay through the path is 



D = 



r * 





( 6 . 11 ) 



( 6 . 12 ) 
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Note that the overall intrinsic delay is a function of the types of logic gates in the path, and 
is not affected by the sizing. 

Example 6.6 Sizing combinational logic for minimum delay 

Consider the logic network of Figure 6.19, which may represent the critical path of a more 
complex logic block. The output of the network is loaded with a capacitance which is 5 times 
larger than the input capacitance of the first gate, which is a minimum-sized inverter. The 
effective fanout of the path hence equals F = C L /C gl = 5. Using the entries in Table 6.4, we 
find the path logical effort 

G = lx-x-xl = — 

3 3 9 

H = FG = 125/9, and the optimal stage effort h is 1 JH = 1.93. Taking into account the gate 
types, we derive the fanout factors:/! = 1.93;/ 2 = 1.93x(3/5) = 1.16;/ 3 = 1.16;/ 4 =1.93. Notice 
that the inverters are assigned larger electrical efforts than the more complex gates because 
they are better at driving loads. From this, we can derive the sizes of the gates (with respect to 
their minimum-sized versions): a =f^gn = 1.16; b =/ 1 /i/ff 3 = 1.34; c =/ 1 / 2 / 3 /g 4 =2.60. 

These calculations do not have to be very precise. As discussed in the Chapter 5, sizing 
a gate too large or too small by a factor of 1.5 still result in circuits within 5% of minimum 
delay. Therefore, the “back of the envelope” hand calculations using this technique are quite 
effective. 




Figure 6.19 Critical path of 
combinational network. 



Power Consumption in CMOS Logic Gates 

The sources of power consumption in a complementary CMOS inverter were discussed in 
detail in Chapter 5. Many of these issues apply directly to complex CMOS gates. The 
power dissipation is a strong function of transistor sizing (which affects physical capaci- 
tance), input and output rise/fall times (which affects the short-circuit power), device 
thresholds and temperature (which affect leakage power), and switching activity. The 
dynamic power dissipation is given by « (l ^ , C L V DD 2 /. Making a gate more complex 
mostly affects the switching activity tt 0 ^,, which has two components: a static component 
that is only a function of the topology of the logic network, and a dynamic one that results 
from the timing behavior of the circuit — the latter factor is also called glitching. 

Logic Function — The transition activity is a strong function of the logic function being 
implemented. For static CMOS gates with statistically independent inputs, the static 
transition probability is the probability p Q that the output will be in the zero state in one 
cycle, multiplied by the probability p y that the output will be in the one state in the next 
cycle: 



a 0->l = 'V^1 = po^ 1 ~pq ) 



(6.13) 
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Assuming that the inputs are independent and uniformly distributed, any /V-input static 
gate has a transition probability that corresponds to 



a, 



*0 "o •( 2 "- A, o) 

2 n\n~ ,2 JV 



(6.14) 



where N 0 is the number of zero entries and N ] is the number of one entries in the output 
column of the truth table of the function. To illustrate, consider a static 2-input NOR gate 
whose truth table is shown in Table 6.5. Assume that only one input transition is possible 
during a clock cycle, and that the inputs to the NOR gate have a uniform input distribution 
— this is, the four possible states for inputs A and B (00, 01, 10, 1 1) are equally likely. 



Table 6.5 Truth table of a 2 input NOR gate. 



A 


B 


Out 


0 


0 


1 


0 


1 


0 


1 


0 


0 


1 


1 


0 



From Table 6.5 and Eq. (6.14), the output transition probability of a 2-input static 
CMOS NOR gate can be derived: 






l 



2 N ~N r 



,2 N 



2 2 -3 



, 2*2 



3 _ 

16 



(6.15) 



Problem 6.2 N input XOR gate 

Assuming the inputs to an A'- in put XOR gate are uncorrelated and uniformly distributed, 
derive the expression for the switching activity factor. 



Signal Statistics — The switching activity of a logic gate is a strong function of the input 
signal statistics. Using a uniform input distribution to compute activity is not a good one 
since the propagation through logic gates can significantly modify the signal statistics. For 
example, consider once again a 2-input static NOR gate, and let p a and p h be the 
probabilities that the inputs A and B are one. Assume further that the inputs are not 
correlated. The probability that the output node equals one is given by 

P\ = OTA,) ( ] ~Pb) (6-16) 

Therefore, the probability of a transition from 0 to 1 is 



«o->i =PoPi = 0 “0 ~P(i) (1 -P b )) (i-Fa) (!-Fi) 



(6.17) 
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Figure 6.20 Transition activity of 
a two-input NOR gate as a 
function of the input probabilities 

(Pa’Pb) 



Figure 6.20 shows the transition probability as a function of p a and p h . Observe how 
this graph degrades into the simple inverter case when one of the input probabilities is set 
to 0. From this plot, it is clear that understanding the signal statistics and their impact on 
switching events can be used to significantly impact the power dissipation. 



Problem 6.3 Power Dissipation of Basic Logic Gates 

Derive the 0 — » 1 output transition probabilities for the basic logic gates (AND, OR, XOR). 
The results to be obtained are given in Table 6.6. 



Table 6.6 Output transition probabilities for static logic gates. 







AND 


(! ~PaPb)PaPb 


OR 




XOR 


[1 ~(Pa+Pb- 2 'PaPb)KPa + Pb ~ 2 PaPb ) 



Inter-signal Correlations — The evaluation of the switching activity is further 
complicated by the fact that signals exhibit correlation in space and time. Even if the 
primary inputs to a logic network are uncorrelated, the signals become correlated or 
“colored”, as they propagate through the logic network. This is best illustrated with a 
simple example. Consider first the circuit shown in Figure 6.21a, and assume that the 
primary inputs, A and B, are uncorrelated and uniformly distributed. Node C has a 1 (0) 
probability of 1/2, and a 0->l transition probability of 1/4. The probability that the node Z 
undergoes a power consuming transition is then determined using the AND-gate expres- 
sion of Table 6.6. 



Po->i = (l-FaPb)PaFb = (1-1/2 • 1/2) 1/2 • 1/2 = 3/16 



(6.18) 
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(a) Logic circuit without 
reconvergent fanout 



A 



(b) Logic circuit with 
reconvergent fanout 




Figure 6.21 Example illustrating the effect of signal correlations. 



The computation of the probabilities is straightforward: signal and transition proba- 
bilities are evaluated in an ordered fashion, progressing from the input to the output node. 
This approach, however, has two major limitations: (1) it does not deal with circuits with 
feedback as found in sequential circuits; (2) it assumes that the signal probabilities at the 
input of each gate are independent. This is rarely the case in actual circuits, where recon- 
vergent fanout often causes inter-signal dependencies. For instance, the inputs to the AND 
gate in Figure 6.21b (C and B) are inter-dependent as both are a function of A. The 
approach to compute probabilities, presented previously, fails under these circumstances. 
Traversing from inputs to outputs yields a transition probability of 3/16 for node Z, similar 
to the previous analysis. This value for transition probability is clearly false, as logic trans- 
formations show that the network can be reduced to Z = CAB = A m A = 0, and no transition 
will ever take place. 

To get the precise results in the progressive analysis approach, its is essential to take 
signal inter-dependencies into account. This can be accomplished with the aid of condi- 
tional probabilities. For an AND gate, Z equals 1 if and only if B and C are equal to 1. 

Pz = p(Z=l)=p(B=l,C=l) (6.19) 

where p(B=\ ,C=\ ) represents the probability that B and C are equal to 1 simultaneously. If 
B and C are independent, p(B=l,C=l) can be decomposed into p<B= I ) • p(C= I ), and this 
yields the expression for the AND-gate, derived earlier: p z = p(B= 1) • p(C=l) = p H p c . If a 
dependency between the two exists (as is the case in Figure 6.21b), a conditional probabil- 
ity has to be employed, such as 

p z = p(C=l\B=l)*p(B=l) (6.20) 

The first factor in Eq. (6.20) represents the probability that C=1 given that B= 1. The 
extra condition is necessary as C is dependent upon B. Inspection of the network shows 
that this probability is equal to 0, since C and B are logical inversions of each other, result- 
ing in the signal probability for Z, p z = 0. 

Deriving those expressions in a structured way for large networks with reconvergent 
fanout is complex, especially when the networks contain feedback loops. Computer sup- 
port is therefore essential. To be meaningful, the analysis program has to process a typical 
sequence of input signals, as the power dissipation is a strong function of statistics of those 
signals. 

Dynamic or Glitching Transitions — When analyzing the transition probabilities of 
complex, multistage logic networks in the preceding section, we ignored the fact that the 
gates have a non-zero propagation delay. In reality, the finite propagation delay from one 
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logic block to the next can cause spurious transitions, called glitches, critical races, or 
dynamic hazards , to occur: a node can exhibit multiple transitions in a single clock cycle 
before settling to the correct logic level. 

A typical example of the effect of glitching is shown in Figure 6.22, which displays 
the simulated response of a chain of NAND gates for all inputs going simultaneously from 
0 to 1 . Initially, all the outputs are 1 since one of the inputs was 0. For this particular tran- 
sition, all the odd bits must transition to 0 while the even bits remain at the value of 1. 
However, due to the finite propagation delay, the higher order even outputs start to dis- 
charge and the voltage drops. When the correct input ripples through the network, the out- 
put goes high. The glitch on the even bits causes extra power dissipation beyond what is 
required to strictly implement the logic function. Although the glitches in this example are 
only partial (i.e., not from rail to rail), they contribute significantly to the power dissipa- 
tion. Long chains of gates often occur in important structures such as adders and multipli- 
ers and the glitching component can easily dominate the overall power consumption. 



Design Techniques to Reduce Switching Activity 



The dynamic power of a logic gate can be reduced by minimizing the physical capacitance and 
the switching activity. The physical capacitance can be minimized in a number ways, including 
circuit style selection, transistor sizing, placement and routing, and architectural optimizations. 
The switching activity, on the other hand, can be minimized at all level of the design abstrac- 
tion, and is the focus of this section. Logic structures can be optimized to minimize both the 
fundamental transitions required to implement a given function, and the spurious transitions. 

1. Logic Restructuring 

Changing the topology of a logic network may reduce its power dissipation. Consider for 
instance two alternate implementations of F = A • B • C • D, as shown in Figure 6.23. Ignore 
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Chain structure 




Figure 6.23 Simple example to demonstrate the influence of circuit topology on activity. 

glitching and assume that all primary inputs ( A,B,C,D ) are uncorrelated and uniformly distrib- 
uted (i.e., f a b c dr 0-5)- Using the expressions from Table 6.6, the activity can be computed 
for the two topologies, as shown in Table 6.7. The results indicate that the chain implementa- 
tion will have an overall lower switching activity than the tree implementation for random 
inputs. However, as mentioned before, it is also important to consider the timing behavior to 
accurately make power trade-offs. In this example the tree topology will have lower (no) 
glitching activity since the signal paths are balanced to all the gates. 



Table 6.7Probabilities for tree and chain topologies. 





Oi 


0 2 


F 


Pi (chain) 


1/4 


1/8 


1/16 


Po = 1 -P\ (chain) 


3/4 


7/8 


15/16 


Po->\ (chain) 


3/16 


7/64 


15/256 


Pi (tree) 


1/4 


1/4 


1/16 


Po = 1 -Pi (tree) 


3/4 


3/4 


15/16 


Po->i ( tree > 


3/16 


3/16 


15/256 



2. Input ordering 

Consider the two static logic circuits of Figure 6.24. The probabilities of A, B and C being 1 are 
listed in the Figure. Since both circuits implement identical logic functionality, it is clear that 
the activity at the output node Z is equal in both cases. The difference is in the activity at the 
intermediate node. In the first circuit, this activity equals ( 1 - 0.5 X 0.2) (0.5 x 0.2) = 0.09. In 
the second case, the probability that a 0 — > 1 transition occurs equals (1 - 0.2 x 0.1) (0.2 X 0.1) 
= 0.0196. This is substantially lower. From this we learn that it is beneficial to postpone the 
introduction of signals with a high transition rate (i.e., signals with a signal probability close to 
0.5). A simple reordering of the input signals is often sufficient to accomplish that goal. 

3. Time-multiplexing resources 

Time-multiplexing a single hardware resource — such as a logic unit or a bus — over a number 
functions is an often used technique to minimize the implementation area. Unfortunately, the 
minimum area solution does not always result in the lowest switching activity. For example, 
consider the transmission of two input bits (A and B) using either dedicated resources or a time- 
multiplexed approach, as shown in Figure 6.25. To first order — ignoring the multiplexer over- 
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P(A = i) = 0-5 
P(B= i) = 0-2 
P(C= i) = 0-1 



Figure 6.24 Reordering of inputs affects the circuit activity. 



head — , it would seem that the degree of time-multiplexing should not affect the switched 
capacitance, since the time-multiplexed solution has half the capacitance switched at twice the 
frequency (for a fixed throughput). 

If data being transmitted were random, it will make no difference which architecture is 
used. However if the data signals have some distinct properties (called temporal correlation), 
the power dissipation of the time-multiplexed solution can be significantly higher. Suppose, for 
instance, that A is always (or mostly) 1 and B is (mostly) 0. In the parallel solution, the 
switched capacitance is very low since there are very few transitions on the data bits. However, 
in the time-multiplexed solution, the bus toggles between 0 and 1. Care must be taken in digital 
systems to avoid time-multiplexing data streams with very distinct data characteristics. 




(a) parallel data transmission (b) serial data transmission 



Figure 6.25 Parallel versus time-multiplexed data busses. 



4. Glitch Reduction by balancing signal paths 

The occurrence of glitching in a circuit is mainly due to a mismatch in the path lengths in the 
network. If all input signals of a gate change simultaneously, no glitching occurs. On the other 
hand, if input signals change at different times, a dynamic hazard might develop. Such a mis- 
match in signal timing is typically the result of different path lengths with respect to the pri- 
mary inputs of the network. This is illustrated in Figure 6.26. Assume that the XOR gate has a 
unit delay. The first network (a) suffers from glitching as a result of the wide disparity between 
the arrival times of the input signals for a gate. For example, for gate F 3 , one input settles at 
time 0, while the second one only arrives at time 2. Redesigning the network so that all arrival 
times are identical can dramatically reduce the number of superfluous transitions (network b). 



Summary 

The CMOS logic style described in the previous section is highly robust and scalable with 
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(a) Network sensitive to glitching (b) Glitch-free network 

Figure 6.26 Glitching is influenced by matching of signal path lengths. The annotated number; 
indicate the signal arrival times. 

technology, but requires 2 N transistors to implement a A'-input logic gate. Also, the load 
capacitance is significant, since each gate drives two devices (a PMOS and an NMOS) per 
fan-out. This has opened the door for alternative logic families that either are simpler or 
faster. 



6.2.2 Ratioed Logic 

Concept 

Ratioed logic is an attempt to reduce the number of transistors required to implement a 
given logic function, at the cost of reduced robustness and extra power dissipation. The 
purpose of the PUN in complementary CMOS is to provide a conditional path between 
V DD and the output when the PDN is turned off. In ratioed logic, the entire PUN is replaced 
with a single unconditional load device that pulls up the output for a high output (Figure 
6.27a). Instead of a combination of active pull-down and pull-up networks, such a gate 
consists of an NMOS pull-down network that realizes the logic function, and a simple load 
device. Figure 6.27b shows an example of ratioed logic, which uses a grounded PMOS 
load and is referred to as a pseudo-NMOS gate. 




Figure 6.27 Ratioed logic gate. 



The clear advantage of pseudo-NMOS is the reduced number of transistors (N + 1 
versus 2 N for complementary CMOS). The nominal high output voltage (V 0H ) for this 
gate is V DD since the pull-down devices are turned off when the output is pulled high 
(assuming that V 0L is below V Tll ). On the other hand, the nominal low output voltage is 
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not 0 V since there is a fight between the devices in the PDN and the grounded PMOS 
load device. This results in reduced noise margins and more importantly static power dis- 
sipation. The sizing of the load device relative to the pull-down devices can be used to 
trade-off parameters such a noise margin , propagation delay and power dissipation. Since 
the voltage swing on the output and the overall functionality of the gate depends upon the 
ratio between the NMOS and PMOS sizes, the circuit is called ratioed. This is in contrast 
to the ratioless logic styles, such as complementary CMOS, where the low and high levels 
do not depend upon transistor sizes. 

Computing the dc-transfer characteristic of the pseudo-NMOS proceeds along paths 
similar to those used for its complementary CMOS counterpart. The value of V 0L is 
obtained by equating the currents through the driver and load devices for V in = V DD . At 
this operation point, it is reasonable to assume that the NMOS device resides in linear 
mode (since the output should ideally be close to OV), while the PMOS load is saturated. 



k n {(V DD - V Tn )V 0L ~^f]= k p {i~V DD - V Tp ) ■ V DSAT - (6.21) 

Assuming that V 0L is small relative to the gate drive (V DD -V T ) and that V Tn is equal 
to V Tp in magnitude, V 0L can be approximated as: 



1/ _ kp(-V DD ■ 

v ol 



v Tp ) ■ v. 



DSAT 



k„^DD ~ ^Tn) 



I VZp 
K ■ W „ 



V; 



DSAT\ 



(6.22) 



In order to make V 0L as small as possible, the PMOS device should be sized much 
smaller than the NMOS pull-down devices. Unfortunately, this has a negative impact on 
the propagation delay for charging up the output node since the current provided by the 
PMOS device is limited. 

A major disadvantage of the pseudo-NMOS gate is the static power that is dissi- 
pated when the output is low through the direct current path that exists between V DD and 
GND. The static power consumption in the low-output mode is easily derived 



low 



^DD^Ioh 



V n 



K\ (-Vdd- V Tp ) ■ V, 



DSATp 



_ VpSATp j 



(6.23) 



Example 6.7 Pseudo-NMOS Inverter 

Consider a simple pseudo-NMOS inverter (where the PDN network in Figure 6.27 degener- 
ates to a single transistor) with an NMOS size of 0.5pm/0.25pm. The effect of sizing the 
PMOS device is studied in this example to demonstrate the impact on various parameters. 
The W/L ratio of the grounded PMOS is varied over values from 4, 2, 1, 0.5 to 0.25. Devices 
with a WIL < 1 are constructed by making the length longer than the width. The voltage trans- 
fer curve for the different sizes is plotted in Figure 6.28. 

Table 6.8 summarizes the nominal output voltage (V 0L ), static power dissipation, and 
the low-to-high propagation delay. The low-to-high delay is measured as the time to reach 
1.25V from V 0L (which is not 0V for this inverter). This is chosen since the load gate is a 
CMOS inverter with a switching threshold of 1.25V. The trade-off between the static and 
dynamic properties is apparent. A larger pull-up device improves performance, but increases 
static power dissipation and lowers noise margins (i.e., increases V 0L ). 
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Figure 6.28 Voltage-transfer curves of 
the pseudo-NMOS inverter as a 
function of the PMOS size. 



Table 6.8Performance of a pseudo-NMOS inverter. 



Size 


< 

o 

r 


Static Power 
Dissipation 


^plh 


4 


0.693V 


564(lW 


14ps 


2 


0.273V 


298flW 


56ps 


1 


,133V 


160|TW 


123ps 


0.5 


0.064V 


80|iW 


268ps 


0.25 


0.031V 


41flW 


569ps 



Notice that the simple first-order model to predict V OL is quite effective. For a 
PMOS WIL of 4, V 0L is given by (30/1 15) (4) (0.63V) = 0.66V. 



The static power dissipation of pseudo-NMOS limits its use. However, pseudo- 
NMOS still finds use in large fan-in circuits. Figure 6.29 shows the schematics of pseudo- 
NMOS NOR and NAND gates. When area is most important, the reduced transistor count 
compared to complimentary CMOS is quite attractive. 

Vdd 



Vdd 




(a) NOR 

Figure 6.29 Four-input pseudo-NMOS NOR 
and NAND gates. 
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Problem 6.4 NAND Versus NOR in Pseudo-NMOS 

Given the choice between NOR or NAND logic, which one would you prefer for implementa- 
tion in pseudo-NMOS? 



How to Build Even Better Loads 



It is possible to create a ratioed logic style that completely eliminates static currents and 
provides rail-to-rail swing. Such a gate combines two concepts: differential logic and pos- 
itive feedback. A differential gate requires that each input is provided in complementary 
format, and produces complementary outputs in turn. The feedback mechanism ensures 
that the load device is turned off when not needed. A example of such a logic family, 
called Differential Cascode Voltage Switch Logic (or DCVSL), is presented conceptually 
in Figure 6.30a [Heller84]. 

The pull-down networks PDN1 and PDN2 use NMOS devices and are mutually 
exclusive (this is, when PDN 1 conducts, PDN2 is off, and when PDN 1 is off, PDN2 con- 
ducts), such that the required logic function and its inverse are simultaneously imple- 
mented. Assume now that, for a given set of inputs, PDN1 conducts while PDN2 does not, 
and that Out and Out are initially high and low, respectively. Turning on PDN1, causes 
Out to be pulled down, although there is still a fight between M 1 and PDN 1 . Out is in a 
high impedance state, as M 2 and PDN2 are both turned off. PDN 1 must be strong enough 
to bring Out below V DD -\V Tp \, the point at which M 2 turns on and starts charging Out to 
V DD — eventually turning off M v This in turn enables Out to discharge all the way to GND. 
Figure 6.30b shows an example of an XOR/XNOR gate. Notice that it is possible to share 
transistors among the two pull-down networks, which reduces the implementation over- 
head 




(a) Basic principle 



Vdd 




Figure 6.30 DCVSL logic gate. 



The resulting circuit exhibits a rail-to-rail swing, and the static power dissipation is 
eliminated: in steady state, none of the stacked pull-down networks and load devices are 
simultaneously conducting. However, the circuit is still ratioed since the sizing of the 
PMOS devices relative to the pull-down devices is critical to functionality, not just perfor- 
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mance. In addition to the problem of increase complexity in design, this circuit style still 
has a power-dissipation problem that is due to cross-over currents. During the transition, 
there is a period of time when PMOS and PDN are turned on simultaneously, producing a 
short circuit path. 

Example 6.8 DCVSL Transient Response 

An example transient response is shown for a n AND/NAND gate in DCVSL. Notice 
that as Out is pulled down to V DD -\V Tp \, Out s tarts to charge up to V DD quickly. The 
delay from the input to Out is 197 psec and to Out is 321 psec. A static CMOS AND 
gate (NAND followed by an inverter) has a delay of 200ps. 





Time, ns 

Figure 6.31 Transient response of a simple AND/NAND DCVSL gate. /W, and M 2 
1fim/0.25jim, M s and /W 4 are 0.5(im/0.25pm and the cross-coupled PMOS devices are 
1.5pm/0.25(j.m. 



Design Consideration: Single-ended versus Differential 



The DCVSL gate provides differential (or complementary) outputs. Both the output signal 
(V outl ) and its inverted value (V ollt2 ) are simultaneously available. This is a distinct advantage, 
as it eliminates the need for an extra inverter to produce the complementary signal. It has been 
observed that a differential implementation of a complex function may reduce the number of 
gates required by a factor of two! The number of gates in the critical timing path is often 
reduced as well. Finally, the approach prevents some of the time-differential problems intro- 
duced by additional inverters. For example, in logic design it often happens that both a signal 
and its complement are needed simultaneously. When the complementary signal is generated 
using an inverter, the inverted signal is delayed with respect to the original (Figure 6.32a). This 
causes timing problems, especially in very high-speed designs. The differential output capabil- 
ity avoids this problem (Figure 6.32b). 

With all these positive properties, why not always use differential logic? Well, the differ- 
ential nature virtually doubles the number of wires that has to be routed, leading very often to 
unwieldy designs (on top of the additional implementation overhead in the individual gates). 
Additionally, the dynamic power dissipation is high. 
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(b) Differential 



-> Figure 6.32 Advantage of differential 
(b) over single-ended (a) gate. 



6.2.3 Pass-Transistor Logic 

Pass-Transistor Basics 

A popular and widely-used alternative to complementary CMOS is pass-transistor logic, 
which attempts to reduce the number of transistors required to implement logic by allow- 
ing the primary inputs to drive gate terminals as well as source/drain terminals 
[Radhakrishnan85]. This is in contrast to logic families that we have studied so far, which 
only allow primary inputs to drive the gate terminals of MOSFETS. 

Figure 6.33 shows an implementation of the AND 
function constructed that way, using only NMOS tran- 
sistors. In this gate, if the B input is high, the top transis- 
tor is turned on and copies the input A to the output F. 

When B is low, the bottom pass transistor is turned on 
and passes a 0. The switch driven by B seems to be 
redundant at first glance. Its presence is essential to 
ensure that the gate is static, this is that a low-imped- 
ance path exists to the supply rails under all circum- 
stances, or, in this particular case, when B is low. 

The promise of this approach is that fewer transistors are required to implement a given 
function. For example, the implementation of the AND gate in Figure 6.33 requires 4 tran- 
sistors (including the inverter required to invert B), while a complementary CMOS imple- 
mentation would require 6 transistors. The reduced number of devices has the additional 
advantage of lower capacitance. 

Unfortunately, as discussed earlier, an NMOS device is effective at passing a 0 but 
is poor at pulling a node to V DD . When the pass transistor pulls a node high, the output 
only charges up to V DD - V Tn . In fact, the situation is worsened by the fact that the devices 
experience body effect, as there exists a significant source-to-body voltage when pulling 
high. Consider the case when the pass transistor is charging up a node with the gate and 
drain terminals set at V DD . Let the source of the NMOS pass transistor be labeled x. Node 
x will charge up to V DD -V In (V x ): 



B 

J_ 

A j ~ l 



o 



T 



F = AB 



Figure 6.33 Pass-transistor 
implementation of an AND gate. 



V x = V nD - ( V ( „ 0 + y( ( J20p + V x )-f2o f )) 



(6.24) 
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Example 6.9 Voltage swing for pass transistors circuits 

Assuming a power supply voltage of 2.5V, the transient response of Figure 6.34 shows the 
output of a NMOS charging up (where the drain voltage is at V DD and the gate voltage in is 
ramped from 0V to V DD ). Assume that node x was initially 0. Also notice that if IN is low, 



IN 



V DD J [ 



x 



0.5\x.m/0.25\i.m 



1 .5\im/0.25\im 

j^X> Out 

0.5\im/0.25\im 




Figure 6.34 Transient response of charging up a node using an N device. Notice the slow tail 
after an initial quick response. 



node x is in a high impedance state (not driven to one of the rails using a low resistance path). 
Extra transistors can be added to provide a path to GND. but for this discussion, the simplified 
circuit is sufficient. Notice that the output charges up quickly initially, but has slow tail. This 
is attributed to the fact that the drive (gate to source voltage) reduces significantly as the out- 
put approaches V DD -V Tn and the current available to charge up node x reduces drastically. 
Hand calculation using Eq. (6.24), results in an output voltage of 1.8V, which comes close to 
the simulated value. 



WARNING: 

The above example demonstrates that pass-transistor gates cannot be cascaded by con- 
necting the output of a pass gate to the gate input of another pass transistor. This is 
illustrated in Figure 6.35a, where the output of M x (node x) drives the gate of another 
MOS device. Node x can charge up to V DD -V Tnl . If node C has a rail to rail swing, node Y 
only charges up to the voltage on node x - V Tn2 , which works out to V DD - V Tn r V' 7 „ 2 - Figure 
6.35b on the other hand has the output of M x (x) driving the junction of M 2 , and there is 
only one threshold drop. This is the proper way of cascading pass gates. 




Figure 6.35 Pass transistor output (Drain/Source) terminal should not drive other gate terminals to 
avoid multiple threshold drops. 



